Memorandum 


DATE: 

TO: 


FROM: 

RE: 


Attached  is  an  extremely  sensitive  report  on  Verity  financing  from  Trident  Capital.  On  July  19th  I 
attended  the  Trident  general  partners'  meeting,  led  by  Don  Dixon  General  Partner.  The  report  includes  a 
Gartner  piece  on  Fulcrum.  The  meeting  was  very  lively  with  many  questions  to  myself  and  a consultant 
from  Sybase  who  undertook  technical  due  diligence.  Verity  is  raising  $3M  and  the  financing  is 
oversubscribed  by  the  original  investors  - they  are  now  up  to  Series  G of  financing.  Verity,  formed  in 
1988,  is  in  the  full-text  database  market  and  has  recently  hired  Philippe  Courtot  as  CEO.  Philippe  sold 
cc:Mail  for  $65M  to  Lotus  from  having  $1200  in  the  bank  and  20  employees  when  he  joined. 


PS  - Thanks  to  Judy  for  putting  the  report  cover  on  INPUT  stationery! 
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DATE: 

TO: 


FROM: 

RE: 


Verity  Due  Diligence  Consulting 


Attached  is  an  extremely  sensitive  report  on  Verity  financing  from  Trident  Capital.  On  July  19th  I 
attended  the  Trident  general  partners'  meeting,  led  by  Don  Dixon  General  Partner.  The  report  includes  a 
Gartner  piece  on  Fulcrum.  The  meeting  was  very  lively  with  many  questions  to  myself  and  a consultant 
from  Sybase  who  undertook  technical  due  diligence.  Verity  is  raising  $3M  and  the  financing  is 
oversubscribed  by  the  original  investors  - they  are  now  up  to  Series  G of  financing.  Verity,  formed  in 
1988,  is  in  the  full-text  database  market  and  has  recently  hired  Philippe  Courtot  as  CEO.  Philippe  sold 
cc:Mail  for  $65M  to  Lotus  from  having  $1200  in  the  bank  and  20  employees  when  he  joined. 


PS  - Thanks  to  Judy  for  putting  the  report  cover  on  INPUT  stationery! 
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in  2015 


https://archive.org/details/verityduediligen01unse 
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Growth  of  electronic  information  sources  have  created 
opportunites  from  individual  desktops  to 
enterprise-wide  systems 
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Why  is  Verity  Attractive  to  Trident? 
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What  is  the  Current  Status  of  Verity? 
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departmental,  and  enterprise  systems 

installed  base  of  650  corporations  and  government 
agencies  worldwide 

1 28  employees 
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‘Selected  Accounts 


TOPIC  at  VVork*: 
Petrochemicals 
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Federal  Bank  of  Boston 
Phibro  Energy 
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What  are  the  potential  returns? 
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Focus  on  the  less  sophisticated  customer  is  now  a requirement 

• Ease  of  use  is  necessary  condition  for  success 

• Mass  markets,  higher  volume,  lower  prices 

• Computer  works  harder  so  user  doesn't  have  to 
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What  is  Verity's  New  Product/Marketing 
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What  are  TOPIC  Libraries? 
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building  blocks  for  customers  to  use 
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FolioViews  (for  on-line  providers) 
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• Became  VP-Sales  for  North  American  in  Dec  '93 

• 5 years  at  Sytek  (Hughes  LAN  Systems)  - Mgr  of 
European  ops 

• BSEE  - Kansas  State 

Wiil  the  Company  be  abie  to  attract  marketing  VP? 
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Draft  of  7/13/94 


MEMORANDUM  OF  TERMS 
FOR  PRIVATE  PLACEMENT  OF 
SERIES  G PREFERRED  STOCK  OF 
VERITY,  INC. 


This  memorandum  summarizes  the  principal  terms  of  the  Series  G Preferred  Stock 
venture  capital  financing  of  Verity,  Inc. 


Offering  Terms 

Issuer: 

Securities  to  be  Issued: 

Aggregate  Proceeds: 
Price: 

Expected  Closing  Date: 
Investors: 


Terms  of 
Preferred  Stock 

Dividends: 


Verity,  Inc.,  a California  corporation  (the 
"Company"). 

3,529,412  shares  of  Series  G Preferred 
Stock  and  3,529,412  Common  Stock 
Warrants. 

$3,000,000. 

$0.85  per  share  for  Preferred  and  $0.23 
exercise  price  for  Common. 

August  12,  1994 

Trident  Capital  $1,500,000 

Existing  Verity  investors 

and  employees  1 ,500,000 

$3,000,000 


Annual  9%  per  share  dividend  on  the 
Preferred  Stock  (or,  if  greater,  an 
amount  equal  to  that  paid  on  any  other 
outstanding  shares  of  the  Company), 
payable  when  and  if  declared  by  Board; 
dividends  are  not  cumulative.  For  any 
other  dividends  or  distributions. 
Preferred  Stock  participates  with 
Common  Stock  on  an  as-converted 
basis. 


Liquidation  Preference 


First  pay  the  sum  of  (i)  $0.85,  (ii) 
declared  but  unpaid  dividends,  and  (iii) 
an  amount  equal  to  $0,085  times  the 
number  of  years  (rounded  to  the 
nearest  1/12)  which  have  transpired 
between  the  Closing  Date  and  the 
Liquidation  Event  on  each  share  of 
Series  G Preferred  Stock  (the  "Series  G 
Preference  Amount").  Next  pay  the 
holders  of  the  Series  A,  B,  C,  D,  E and  F 
Preferred  according  to  the  existing 
articles  of  incorporation.  Thereafter,  the 
Series  G Preferred  Stock  and  Common 
share  on  an  as-converted  basis  (i.e., 
"Participating  Preferred"). 

A merger,  reorganization  or  other 
transaction  in  which  control  of  the 
Company  is  transferred  will  be  treated 
as  if  a liquidation. 

Redemption:  Mandatory  redemption  after  four  years, 

payable  in  eight  equal  quarterly 
installments  at  the  Series  G Preference 
Amount,  upon  request  of  2/3  of  the 
outstanding  Series  G Preferred  Stock. 

Conversion:  Automatically  converts  into  Common 

Stock  upon  consummation  of 
underwritten  public  offering  with  a price 
per  share  of  at  least  $2,715  and 
aggregate  proceeds  in  excess  of 
$7,500,000. 

Antidilution  Adjustments:  Conversion  ratio  adjusted  on  narrow 

weighted  average  basis  in  the  event  of 
a dilutive  issuance.  "Dilutive  issuance" 
shall  not  include  up  to  1 ,995,292  ’ 

shares  of  Common  Stock  reserved  for 
future  issuance  to  employees.  - 

Proportional  adjustments  for  stock  splits 
and  stock  dividends. 

Voting  Rights:  Votes  on  an  as-converted  basis,  but 

also  has  class  vote  as  provided  by  law 
and  on  (i)  the  creation  of  any  senior  or 
pari  passu  security,  (ii)  payment  of 
dividends  on  Common  Stock,  (iii) 
repurchase  of  Common  Stock  except 


upon  termination  of  employment,  (iv) 
any  transaction  in  which  control  of  the 
Company  is  transferred,  (v)  an  increase 
in  the  number  of  authorized  shares  of 
Preferred  Stock,  and  (vi)  any  adverse 
change  to  the  rights,  preferences  and 
privileges  of  the  Preferred  Stock. 

Terms  of 

Common  Stock  Warrants 

Exercise  Price: 

$0.23  per  share.  Holder  has  the 
alternative  to  exercise  on  a "net"  basis. 

Antidilution  Adjustments: 

Exercise  Price  to  maintain  the  same 
ratio  to  Series  G Preferred  Stock 
Conversion  Price  as  $0.23  is  to  $0.85. 
Number  of  warrants  to  change  upon 
change  in  exercise  price  such  that  total 
exercise  amount  for  Series  G Common 
Stock  Warrants  totals  $81 1 ,764.76. 

Term: 

Seven  years  with  no  provision  for  early 
termination. 

Terms  of  Preferred  Stock 
and  Common  Stock  Warrant 
Purchase  Agreement 

Representations  and 
Warranties: 

Standard  representations  and 
warranties  by  the  Company. 

Assignment  of  Inventions 
and  Confidentiality 
Agreement: 

Key  employees  and  consultants  shall 
have  entered  into  the  Company's 
standard  form  inventions  and 
proprietary  information  agreement. 

Expenses: 

Wilson,  Sonsini,  Goodrich  & Rosati  to 
serve  as  counsel  to  the  Investors.  The 
Company  shall  pay  reasonable  fees 
(not  to  exceed  $20,000)  and  expenses 
of  this  single  Investors'  counsel.  All 
Investors  will  use  this  counsel. 
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Registration  Rights: 


(a)  Beginning  earlier  of  three  years 
from  the  closing,  or  three  months  after 
initial  registration,  two  demand 
registrations  upon  initiation  by  holders 
of  at  least  a majority  of  outstanding 
Preferred  Stock  for  aggregate  proceeds 
in  excess  of  $5,000,000.  Expenses 
paid  by  Company. 

(b)  Unlimited  piggyback  registration 
rights  subject  to  pro  rata  cutback  at  the 
underwriter's  discretion.  Full  cutback 
allowed  upon  IPO;  30%  minimum 
inclusion  for  the  former  Series  G 
Preferred  Stockholders  in  IPO  or 
Secondary  Offering.  Expenses  paid  by 
Company. 

(c)  Unlimited  S-3  Registrations  of  at 
least  $500,000  each  upon  initiation  by 
holders  of  20%  of  the  Preferred. 
Expenses  paid  by  Company. 

Registration  rights  terminate  five  years 
after  initial  public  offering. 

No  future  registration  rights  may  be 
granted  without  consent  of  a majority  of 
Investors  unless  subordinate  to 
Investors'  rights. 

Market  Standoff:  180  day  lockup  period  after  date  of  final 

prospectus;  provided,  that  all 
shareholders,  warrant  holders  and 
optionees  have  entered  into  similar 
agreements. 

Right  of  First  Refusal:  The  Investors  shall  have  a pro  rata 

right,  based  on  each  Investor's 
percentage  of  this  financing,  to 
participate  in  subsequent  equity 
financings  of  the  Company  (subject  to 
customary  exclusions).  Right 
terminates  on  and  is  not  applicable  to 
IPO. 


Financial  Information: 


The  Investors  shall  receive  standard 
information  rights  including  audited 
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Board  of  Directors: 

Voting  Agreement: 

Post-Closing  Capitalization 

Series  G Preferred  Stock  outstanding: 

Series  F Preferred  Stock  outstanding: 

Series  E Preferred  Stock  outstanding: 

Series  D Preferred  Stock  outstanding: 

Series  C Preferred  Stock  outstanding: 

Series  B Preferred  Stock  outstanding: 

Series  A Preferred  Stock  outstanding: 

Series  G Common  Warrants  outstanding: 

Common  Stock  outstanding: 

Options  and  Warrants  to  Purchase 
Common  Stock  outstanding: 

Common  Stock  Options  Reserved 
for  Employees: 

Total: 


financial  reports,  quarterly  unaudited 
financial  reports,  monthly  unaudited 
financial  reports,  and  annual  budget 
and  business  plan,  as  well  as  standard 
inspection  rights.  Rights  terminate 
upon  IPO. 


Board  shall  consist  of  9 members. 
Board  shall  include  Trident  Capital  as 
the  representative  of  the  Series  G 
investors. 

The  Company,  the  Investors  and  the 
shareholders  of  the  Company  shall 
enter  into  a voting  agreement  to  effect 
the  agreed-upon  Board  composition. 


3,529,412  shares 
1 ,063,830  shares 
606,061  shares 

2.400.000  shares 
6,891,163  shares 
4,444,444  shares 

2.957.000  shares 
3,529,412  shares 
3,596,921  shares 


8,565,697  shares 


1 .995.292  shares 
39,579,232  shares 
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other  Matters 


Common  Stock  Vesting: 

Restrictions  on  all  Common 
Stock  Transfers: 


Employee  Agreement: 
Closing  Conditions: 


[To  be  supplied  by  Company]. 


(a)  No  transfers  allowed  prior  to 
vesting. 

(b)  Company  right  of  first  refusal  on 
vested  shares  until  initial  public 
offering. 

(c)  Investors  have  right  to  participate 
share-for-share  in  transfers  by  major 
shareholders  prior  to  initial  public 
offering. 

(d)  No  transfers  or  sales  permitted 
during  lock-up  period  of  up  to  180  days 
required  by  underwriters  in  connection 
with  stock  offerings  by  the  Company. 

Philippe  Courtot  shall  have  entered  into 
an  Employment  and  Stock  Option 
Agreement  mutually  satisfactory  to 
Trident  Capital  and  the  Verity  Board  of 
Directors. 

Closing  subject  to  the  following 
conditions: 

a.  Negotiation  of  definitive  legal 
documents  and  completion  of  legal  and 
financial  due  diligence  by  Investors. 

b.  Execution  of  indemnification 
agreement  for  all  officers  and  directors. 
Company  to  purchase  D&O  insurance, 
if  available  for  reasonable  cost,  within 
one  year  of  closing. 

c.  Amendment  of  existing  Series  A,  B, 
C,  D,  E and  F rights  and  preferences 
with  regard  to  [Liquidation  Preference], 
[Conversion],  [Adjustments  to 
Conversion  Rate],  [Voting  rights], 
[Redemption],  and  [Protection 
Provisions]. 
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Research  Update 

• We  continiie  to  believe  Fulcrum  is  uniqudy  positittifid  as  a "pur^lay*'  on  the  rapidly  emerging  text  retneval 
applications  development  business  for  dient/serva*  envirtHunents.  Fulcrum  ddivers  the  roquirKnoits  demanded  by 
today's  applications  developers,  indudiiig: 

0 Popular  Applications  Devdopment  Environnieptjt-  Fulcrum  oBGara  of  its  SearchTooIs  Software 

Development  Kits  (SDKs)  for  Visual  Basic,  Visual  C-H^,  C++,  C,  and  a soon-to-be-raleased  version  for  the  popular 
PowerBuilder  Enterprise  Series  (Fulcrum  “SearchBuilder").  SeardiBuilder  is  set  to  ship  later  this  summer 
(September). 

0 Tools  that  are  Fast.  Pourerful  and  Easy  to  Use:  SearchTooIs  is  viewed  by  most  deveiespers  as  the  fastest 

and  the  most  scalable  text  retrieval  soluticxi  on  the  marled  today.  Competing  text  retrieval  vendors  (including  Oracle, 
PLS,  Conquest,  and  Verity)  are  marketing  vastly  more  expensive  and  complicated  APIs  requiring  significantly  more 
software  developmeitt  time  and  resources.  From  the  end-users'  perspective.  Fulcrum's  "Intuitive  Seardiing"  feature 
(versus  traditional  single  word  and  Boolean  searches)  permits  rapid  searching  of  databases  for  documents  that  arc 
similar  in  content  or  meaning  to  the  current  document  or  phrase(s)  being  used  as  a "search  tMin' . 

0 Adherence  to  Industry  Standarrif;  SearchTooIs  is  based  on  the  widely  accepted  and  weii-understood  query 

language,  SOL.  Fulcrum  also  promotes  and  supports  SGML  for  stnictured  documents.  In  addition,  the  recently 
introduced  SearchTooIs  Version  2.0  (began  shipping  in  mid- June)  supports  Microsoft's  QDBC  for  integration  of  text 


tBfiinuiloa  AOfittlaad  ia  tka  nport  k (fcriv*<i  from  wurew  Vs  believe  to  kc  lelUbta.  bui  tf*  m*k*  ao  r»pna€B«*ti««  that  H m iflrajr^  or  aomplato.  Addldoul  tefe>nMJito“ 
(vsiltfelt,  Any  opmion  in  (fall  report  rsnesd  oar  Judtmes  u liiii  ud  ii  nAject  to  ehufo  vhfacui  notice.  Padfte  Onn>th  Gouitix,  fe  smpkiyea  nod  eflfiUatea  may 
puRbus,  hold,  or  ceil  ifae  leeurlila  tt,  or  perfemt  mvenoiwQC  K*nHnf  icrvlctt  fcr.  my  eompioy  fccludcd  la  (111*  report.  Tfak  report  Is  not  a solleluiloB  of  an  oPfri*  m buy  or 
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^ retrieval  functionality  and  structured  rdstional  database  ^duding  SyBase  and  Oiade)  applic^ons.  ODBC  allows 
"power  end-users*  (v^o  use  standard  desktop  applications  such,  as  Microscdb  Office,  Lotus  SmaitSuite)  and 
applications  devel^ers  (who  use  N^sual  Basic.  4GLs  such  as  PowerBuilder)  to  acc^  Fulcrum  SearchS«’V8f  as  an 
ODBC  data  source. 

0 fntemet  Suooott:  Many  companies  are  turning  to  the  Intemrt  as  a way  of  communicatii^  wHth  and 

learning  more  about  thdr  customers.  In  addition,  thee  same  organizations  have  expresed  a stra%  interest  in 
making  their  textual  information  available  on  the  IritemeL  Fulcrum  is  working  with  wide-area  network  publishing 
and  Internet  leader  WAIS  Inc.  to  ofier  a "server-only"  version  of  SearchServer  to  its  large  base  of  resellers  and 
Fortune  1000  customers. 


^"gigpificant  Events 

• Lower  SDK  Pricing  Encourages  Adoption  Fulcrum  released  SeardiTools  Version  2.0  as  scheduled  m 

June  15th.  Fulcrum  has  dramatically  reduced  SDK  pricing  for  SearchTooIs  in  order  to  induce  developers  to  give  it  a 
try  with  very  little  risk.  The  average  Ful/Text  SDK  sdls  for  SSOK.  SearchTooIs  1 .2  SDK  pricing  was  $7,500. 
Fulcrum  now  offers  SearchTooIs  2.0  SDKs  for  $995.  Mbreo\'er,  Fulcrum  has  bundled  its  tools,  run-time  licenses, 
training  and  90  days  of  support,  called  "Fast-Start  Bundles"  to  encourage  small  scale  deployment  10-,  25-  and  50- 
user  Fast-Start  Bundles  sell  for  $9,999,  $19,999  and  $39,999.  These  changes  significantly  reduce  sales  leadrime  and 
seed  the  market  with  SearchTooIs.  Pow«fioft  had  a major  hit  last  &11  with  the  PowerBuilder  3.0  after  it  unbundled 
its  tools  and  dramatically  lowered  their  prices.  By  con^aarison,  PowerBuilder  Desktop  ($695,  for  XBase  applications 
development),  PowerMaker  ($349,  “power-user"  forms,  reportwriters)  and  PowerMaker  ($199,  an  end-user  database 
query  tool)  all  sell  for  under  $1000.  PowerBuilder  Enterprise  (RDBNffi  applications  devdopmant)  sells  for  $3,395. 

« SearchBullder  to  Ship  Soon  PowerBuilder  has  emerged  as  a very  popular  visual  4GL  tool  of  dioice  today 

among  devdf^ers  for  building  high^jeffonnance  data  access  applications  for  dient/server  environments.  Fulcrum 
SeardiBuild^  should  do  well  am^ig  devdopers  seddng  to  r^idly  deploy  powerful  text-retries^  applications. 

• Quarter  Ended  June  30  The  company  should  comfortably  beat  our  Q294  revenue  and  EPS  estimates  of  $5.2M 

and  $0.04  per  share.  Moreover,  the  outlook  for  the  remainder  of  this  year  and  1995  ts  very  strong.  We  are  likdy  to 
raise  our  FY95  estimates  following  the  June  quarter  release.  Fuloum  r^orts  finandal  results  on  August  2. 

« Significant  New  Customer  Announoements  We  expect  Ful<7um  to  announce  major  new  SearchTooIs 

customers  and  strategic  deals  for  domestic  and  intematimiai  markets  in  the  coming  few  wedcs. 


Investment  Outlook 


We  continue  to  strongly  rccommand  purdiase  of  shares  in  Fulmum  Technologies.  The  cotrqjany  has  comfortably 
exceeded  our  revenue  and  earnings  ^timat^  for  the  past  2 quarters.  Fulcrum  trades  at  the  low  end  of  a $10- 1 6 trading 
range  since  its  IPO  last  Novwnb«.  Despite  the  recent  market  correction,  particularly  for  tedinology  issues,  investors 
continufi  to  value  hi^growth  cli«it/server  software  stocks  at  25-30X  calendar  1995  earnings  estimates,  a class  of 
stocks  in  which  Fulcrum  cleariy  belongs. 

— P/E 


Stock 

C94 

C95 

PWRS 

PARQ 

44X 

36X 

'(iJX'i 

ATSW 

52X 

29X 

SYBS 

35X 

26X  - 

FULCF 

30X 

Ful^um  trades  at  a discount  to  this  group,  even  on  our  conservative  financia!  estimates  throu^  calendar  1995.  As 
positive  devdopments  and  earnings  Surprises  unfold  during  the  next  fow  months,  we  expect  Fulcrum  shares  will  rise. 
Our  taigd  price  for  the  stock  is  $18-20  pv  share. 
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Abstract 


An  overview  of  text  processing  is  presented,  followed  by  a 
discussion  of  the  the  industry  in  terms  of  major  application 
categories,  followed  by  a discussion  of  Verity's  position  in  the 
industry  and  conclusions. 


Introduction 


Text  processing  systems  represent  a growing  market.  This  document 
provides  some  background  on  text  processing,  identifies  some  of  the 
criteria  by  which  they  may  be  judged,  and  contrasts  the  features  and 
capabilities  of  several  systems  with  those  of  the  product  Topic  from 
Verity. 

What  is  Text  Processing 

Typical  databases  are  either  hierarchical  or  relational  databases  of 
"records",  or  highly  structured,  usually  tabular  data.  Recent 
advances  in  publishing  and  authoring  technologies  have  created  an 
explosion  of  information,  most  of  it  in  machine  readable  format. 
Most  of  this  information  is  in  the  form  of  documents  which  organize 
information  in  a very  different  fashion  than  traditional  database 
systems.  The  term  "unstructured  data"  has  often  been  applied  to 
databases  where  the  items  of  interest  are  documents  rather  than 
rows  or  columns  in  a table.  A such,  the  term  is  something  of  a 

misnomer,  but  draws  a necessary  distinction  between  Textual 
Information  Retrieval  Systems  (TIRS)  and  the  more  traditional 
relational,  hierarchical,  and  network  database  systems  (TDBMS.) 

The  importance  of  the  representation  of  structure  in  text  will  be 
discussed  throughout  this  discourse. 

Text  processing  comprises  several  activities,  the  most  important  for 
the  purposes  of  this  research  being  the  following:  Capture,  Storage, 

Searching,  Presentation,  and  Applications.  Different  programs  and 
products  address  these  areas  with  varying  degrees  of  enthusiasm. 
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but  almost  all  packages  must  address  the  issues  of  storage  and 
searching,  and  as  such  these  topics  demand  the  greatest  attention. 

Data  Capture 

Capture  refers  to  the  processes  of  making  a body  of  information 

available  to  a TIRS.  A body  of  text  which  has  been  made  machine- 

readable  is  generally  referred  to  as  a corpus  (plural  corpora).  The 

ready  availability  of  desktop  publishing  has  spawned  millions  of 
documents  rendered  as  corpora.  Yet,  the  vast  majority  of  textual 
records  are  still  available  today  only  as  ink  on  paper,  often  stored 
without  indices.  Technologies  such  as  Optical  Character  Recognition 
attempt  to  automate  the  conversion  from  ink  on  paper  to  bytes, 
although  the  accuracy  of  conversion  often  requires  human  assistance 
to  achieve  high  quality  conversion. 

As  a result,  much  of  the  attention  of  the  authoring  world  is  focused 
on  producing  documents  which  do  not  require  conversion  from  an 
image  to  text.  Few  if  any  commercial  newspapers  use  anything  but 
computers  for  typesetting.  Computer  networks  accept  and  propagate 
hundreds  to  thousands  of  megabytes  of  articles,  mail,  software,  and 
news  postings  every  day.  Interestingly,  however,  competitive 
pressures  have  until  very  recently  created  a large  number  of 
document  formats,  word  processing  applications,  and  information 
viewers  which  range  from  occasionally  interoperable  to  completely 

incompatible.  Current  standards  work  such  as  the  Unicode  standard 
(for  uniformly  encoding  the  character  sets  of  the  worlds  languages), 
Adobe  Acrobat  (providing  a common  document  presentation  format), 
and  SGML  (standardizing  formatting,  structure,  and  annotation)  are 
attempts  to  consolidate  the  often  gratuitous  differences  between 
competing  commercial  formats  while  providing  room  for  product 
differentiation. 

The  challenge  for  data  capture  systems  is  to  provide  the  search 
engine  with  relevant  information  which  fits  within  the  bounds  of  the 
capabilities  of  the  storage  mechanisms  available. 

Storage 

Document  storage  has  traditionally  been  achieved  with  file  systems. 
Every  popular  computer  operating  system  offers  at  least  one  file 
storage  system,  often  multiple  systems  concurrently,  for  storing  and 
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organizing  things  we  call  files.  Some  operating  systems  make 
assumptions  about  the  structure  of  the  contents  of  files.  VMS,  for 
instance,  offers  files  which  can  contain  records  similar  to  those  in 
traditional  database  systems  (RMS  files),  as  well  as  files  whose 
contents  are  completely  opaque.  The  important  point  here  is  that  the 
operating  systems  has  knowledge  of  the  structure.  The  Unix 
operating  system  takes  the  opposite  approach,  and  treats  all  files  as  a 
sequence  of  bytes.  It  is  up  to  programs  which  use  the  files  to 

understand  their  structure.  This  approach  leads  to  greater 

generality,  often  at  the  cost  of  more  code  or  limitations  on 
performance  optimizations. 

With  respect  to  TIRS's,  the  important  fact  is  the  association  between 
a sequence  of  bytes  and  a program  of  function  which  can  interpret 
those  bytes.  Programs  which  attempt  to  locate  keywords  on  a hard 
disk  and  then  find  the  files  which  contain  those  keywords  must 
either  understand  all  possible  file  formats,  or  else  have  access  to 
another  program  or  function  which  does.  As  an  example,  consider  a 

search  of  a disk  for  the  word  DOG  on  a disk.  It  is  quite  possible  that 

the  sequence  of  characters  D«0*G  might  be  in  an  ASCII  file  on  pet 
maintenance.  It  might  also  just  happen  to  be  the  binary 

representation  of  a machine  language  operation  in  the  code  portion 
of  a DOS  .EXE  file.  It  is  necessary  to  indicate  the  context  of  a search 
to  avoid  addressing  storage  where  the  context  is  inappropriate. 

The  notion  of  context  is  central  to  all  text  processing,  and  consumes 
much  of  the  energy  devoted  to  the  discipline.  A TIRS  must  be 
flexible  enough  to  permit  corpora  from  many  different  software 
packages,  encoding  formats,  operating  system  file  and  directory 
structures  to  be  referenced  and  manipulated.  Commercial  software 
applications  are  just  today  transitioning  from  monolithic,  file-based 
cooperation  schemes  to  active,  message-based  cooperation,  such  as 
OLE  2.0,  DDE,  AppleEvents,  CORBA,  and  DCE.  Applications  in  the 
future  will  comprise  a bundle  of  software  components,  individually 
replaceable  and  upgradeable  at  the  point  of  use.  At  this  time,  the 
storage  and  access  of  documents  with  respect  to  TIRS  will  become  a 
commodity. 

The  result  of  a query  into  a text  database  produces  a set  of 
references  to  corpora  which  may  have  many  different  structures; 
one  reference  might  be  to  a paragraph  in  a book,  another  to  a page  in 
a magazine,  another  to  a row  in  a relation  database,  an  so  on.  For 
any  search  engine,  the  precision  of  the  indexing  is  determined  by  the 
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precision  of  reference  provided  by  the  referent  storage  systems. 
This  precision  is  often  at  the  level  of  a file,  or  may  be  as  detailed  as 
a file  name  and  a byte  offset  into  the  file.  In  other  cases,  such  as  a 
relational  database,  a database  name  and  query  may  be  required,  or 
even  the  database  name,  a table  name,  and  a primary  key. 
Whatever  the  access  scheme,  the  operational  characteristics  of  the 
referent  systems  must  be  accommodated,  or  else  the  integrity  of  the 
referent  systems  can  be  violated. 

An  example  is  document  security.  Most  file  systems  provide  file- 

level  security,  permiting  access  to  files  based  upon  rules,  file 

ownerships  and  permissions.  A TIRS  referencing  the  files  in  such  a 
system  should  take  into  account  the  access  rights  of  the  user  making 

a query,  and  exclude  files  from  the  result  list  which  are  not 

accessible  by  that  user.  Unfortunately,  most  systems  do  not 

accommodate  such  security  rules.  Even  more  difficult  is  the  case 
where  multiple  types  of  referents  are  maintained,  such  as  files, 
relational  databases,  and  network  feeds,  as  each  may  have  their  own 
protection  mechanisms  which  are  completely  unrelated. 

Issues  regarding  the  proper  security  models,  determining  the  right 
referencing  models,  and  managing  indexes  can  be  reduced  to  two 
concepts:  structure  and  policy  models.  Structure  models  detail  the 

organizational  structures  which  a storage  system  can  accommodate, 
and  policy  models  describe  the  choices  to  be  selected  from  among  the 
possible  options  of  a structure  model.  Efforts  such  as  SGML  and 
Acrobat  are  attempts  to  create  structure  models  and  provide  well 
defined  policy  choices.  By  covering  the  capabilities  of  the  structure 
and  policy  models,  application  developers  can  develop  systems  which 
accommodate  a variety  of  storage  models  without  redundant  coding. 
In  an  ideal  world,  the  storage  system  becomes  a simple  API  issue, 
with  the  complexity  of  managing  structure  and  policy  models  in  the 
search  component. 

Searching 

Searching  represents  the  primary  task  of  text  processing,  and  over 
time,  will  represent  the  primary  product  differentiation  between  text 
processing  products.  Searching  involves  three  fundamental  tasks: 
query  expression  processing,  query  optimization,  and  referent 
retrieval. 
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Query  expression  evaluation  involves  the  conversion  of  the  request 
for  a search  into  a search  plan.  The  techniques  used  to  specify  a 
query  range  from  pure  point-and-click  interfaces  to  extended  SQL 
language  syntax.  The  important  measure  of  query  expression  is 
fidelity  to  the  interface  and  the  intended  purpose;  most  interactive 
Windows  users  would  prefer  the  point-and-click  interface,  while 
database  coders  would  appreciate  extensions  to  SQL.  At  their  very 
essence,  however,  is  an  mathematical  algebra  which  represents  the 
fundamental  capabilities  of  the  search  engine.  Generally,  the  more 
powerful  the  mathematical  capabilities,  the  better  the  search  engine. 
In  practice,  however,  it  is  better  to  make  sure  that  there  is  a good  fit 
between  the  sophistication  of  the  query  expression  and  the 
sophistication  of  the  one  posing  the  query.  This  is  not  to  say  that  the 
same  mathematical  basis  shouldn’t  apply  to  both  simple  and 
sophisticated  queries,  but  rather  that  the  expression  of  the  queries 
should  meet  the  expectations  of  the  intended  user. 

For  instance,  desktop  file  indexing  programs  can  probably  perform 
quite  well  with  a simple  boolean  operator  and  keyword  model:  find 

("fred"  OR  "barney")  AND  ("Wilma"  or  "Betty").  On-line  information 
services  may  need  a sophisticated  natural  language  interface  to  reach 
the  immense  audience  such  systems  have  without  requiring 

extensive  training  on  how  to  compose  a query.  In  both  cases, 
however,  the  results  of  translating  the  queries  result  in  a search  plan 
which  could  be  fed  to  the  same  engine. 

Searching  large  amounts  of  data  efficiently  requires  spending  time  to 
analyze  the  query  to  see  if  opportunities  for  optimization  exist. 
Running  a brute  force  query  over  a few  megabytes  may  only  take  a 
couple  seconds,  and  as  such,  may  not  warrant  a complex  optimization 
pass.  On  the  other  hand,  searching  a few  terabytes  of  data  certainly 
cries  out  for  the  very  best  optimization  attempts.  The  types  of 

optimizations  which  can  be  performed  depends  entirely  upon  the 

nature  of  the  indexing  technique,  and  accurate  cost  estimates  of  the 
expected  performance  of  the  storage  system.  Accurate  estimates 

may  be  difficult  to  make  in  a highly  abstracted  storage  system  where 
details  of  document  placement  and  searchability  may  be  unavailable 
or  impossible  to  compute.  The  storage  systems  must  be  capable  of 
providing  structure  models  and  policy  models  from  which  access  cost 
information  can  be  accurately  determined,  else  query  optimization 
will  be  shoddy  at  best,  especially  compared  to  RDBMS  systems. 
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An  added  complication  for  query  processing  is  the  determination  of 
context.  A search  for  a word  in  the  title  of  a document  may  be 
orders  of  magnitude  faster  than  a search  for  the  same  word  in  the 
body  of  the  document  because  the  structure  imposed  by  the  notion 
of  title  constrains  the  amount  of  text  to  be  searched.  Another  kind  of 
context  is  represented  by  constraining  a search  by  the  usage  of  a 
word,  differentiating  the  meanings  of  a word  such  as  "bear."  Bear 
can  mean  an  animal,  the  state  of  having  no  fur,  the  state  of  having  no 
covering,  the  act  of  sustaining  a force,  etc.  The  responsibility  for 

understanding  the  meaning  of  a word  in  context  is  split  between 

query  processing  and  data  capture.  A very  good  natural  language 
system  (NLS)  can  determine  such  usage  in  a majority  of  cases,  but 
may  fail  in  critical  areas.  The  only  way  to  be  100%  assured  of  the 
correct  meaning  is  to  encode  the  meaning  along  with  the  text,  a 
process  called  tagging.  Tagging  can  be  though  of  as  a structuring 

convention,  as  such  tags  might  be  embedded  in  the  text  as  in  SGML, 

or  can  be  stored  as  separate  annotations  in  a coordinated  document. 
To  properly  interpret  the  context,  the  query  engine  must  understand 
the  encoding  format. 


Several  techniques  have  emerged  as  standard  techniques  for 
improving  search  performance: 

Stop  lists  Remove  low-relevance  words  like  "the",  "an" 

Spelling  Variations  Understand  that  "color"  ==  "colour" 

Stemming  Index  the  base  of  words,  removing  endings 

which  indicate  part  of  speech,  like  -ive,  -ly, 
and  so  on. 

Another  technique,  called  faceting,  attempts  to  structure  queries 
based  on  semantic  information.  A typical  faceted  system  provides 
several  core  words,  such  as  "action",  "object",  "subject",  and  "location", 
and  all  documents  are  indexed  based  on  a key  comprising  the 
concatenation  of  core  words  in  the  document  which  match  the 
keywords.  For  instance,  a typical  query  of  "who  shot  Roger  Rabbit?" 
would  be  encoded  as: 

action:  "killed"  or  "shot" 

object:  "Roger  Rabbit" 

subject:  * 

location:  * 

The  results  of  such  queries  are  typically  much  better  than  systems 
employing  simple  word  searches,  and  usually  better  than  boolean 


Confidential  property  of  Trident  Capital,  Inc. 


6 


iV 


m 


,4i  . . 


I vy 


i 

?> 


. i • 


^ ^ ' I i'll;*  • .' 

' ^'n j ■ ^ : 'll Hw  -j I- V? ’. 

m'’  0 ■■  I : : ;ktj e - •» -i  »..  '.r»d 

.n^;aA  ^ilj  .jgnvw  -Ht;  ‘ 

■ii>h  Aii  ''^  b->ri»  ..|?«^'^-'-*  4'^"  ,• 

' , '-*  -}'i'  <» a 11^ :,'i^  ‘ < < ' i ^ Vj  Tl i b . '%)  v> 

&jI1  afliii'if A'.iP. i.^  -iii(  , I ‘..•/fiinn  c>  •*»#■. 


' *. .. 


.■f  iV 

nha 


li 


WiM'f' 


A . 

‘.'Th  ^,-, 

'*  V ■ V V 

d?f' 

t Ht-ty 

H<->'-  J -*!< 

tt4»  -■ 

' ' 1l4*1^  I 1 

■ ‘ y . 

^ . ■ 

!"*■■'  'fj  1j?,. 

VA*- 

HHV/  01  H !fli.kU^.ru  I-;..- 


•♦A, 


rm  H& 


• itJO  --  ■'  >-'4 

« ..-u. ' 


'■'VO.’  '^‘t^?;‘:T  ■'^Jtu  y‘nm5M.;ii  .,?•>, ^ovo  vt' 


^ , JX4*ni^’f  ';A‘I: 


'i  j 


b'  ;>ved  -i 


V *■■  Vi  jiCi-.vi./'iU'i. . 


,■ 


?»'.'■  tOf.  ■7l\,'>  .■  *'-.«i 


j,y. 

e I u lu  ■ ' f 


iv'.- 


*:t, 


.:>;!■  . I- 

I ^ .^;  e;  % ■ ,. , . ..., 


V h^iu,:^  ^»i>Uv  ,1  ..^.:,vo^- 


tr:$  i!:''..,ii-^f^  .*?  hhn 

^i'tO;>  ,»•  : ,H  '•  , >r  .', 

Wi  'i  » ?..  ■ ; ,4i!3'« ^ W ''fv  . 

.Mi'  tj’ji>-,rvlt-.  t: 


1 tf?,‘ '.  i y t 

♦3"4<Si 


4. 


r,vj 


« ^ t 


lun  ■';m 


iffitthsup  d:jui{  lo  / ItoiM  :^;<  ** 
biO'*'  i><qi;^r.'  iV'^:o^f;^w 


i*'  ./  .-^k; 


1 

& 


■S 


4 


operator  systems.  The  cost  of  encoding  is  very  high,  however,  as  the 
automation  of  the  indexing  requires  both  very  sophisticated  NLS  as 
well  as  substantial  human  intervention. 

The  Topic  system  from  Verity  employs  conceptual  search,  where  a 
concept  is  manifested  as  a topic.  A topic  represents  a statement 
about  the  relevance  of  a search  word  to  a set  of  other  search  words 
or  topics.  The  relevance  is  represented  as  a rational  number 
between  0 and  1 inclusive.  Operators  can  be  applied  between 
relationships  in  a topic  to  add  further  relevance.  Such  operators 
include  the  traditional  AND  and  OR,  but  also  operators  such  as  NEAR, 
ACCRUE,  SOUNDEX  (sounds-like),  and  case  sensitivity.  The 
combination  of  queries  is  represented  by  the  composition  of  the 
topics,  and  optimizations  can  be  performed  on  the  result  based  on 
rearranging  the  relationships  of  the  resulting  topic  graphs. 

Another  interesting  approach  is  represented  by  Excalibur.  Excalibur 
treats  a document  as  a sequence  of  numeric  bytes  representing  a 
signature  much  as  might  be  returned  as  a sonar  echo  from  a 
submarine  or  the  stream  of  digital  data  from  a compact  disk  audio 
recording.  The  query  engine  uses  signal  analysis  techniques  to 
identify  relevant  referents.  This  approach  permits  references  to  any 
kind  of  document,  but  at  the  same  time  cannot  take  advantage  of  the 
contextual  constraints  provided  by  understanding  the  higher  level 
structure  of  textual  documents. 

Regardless  of  which  techniques  are  used,  finding  the  right  set  of 
documents  rests  upon  finding  the  right  query,  which  is  a process  of 
trial  and  error  based.  Even  if  a query  is  perfectly  expressed,  a 
perusal  of  the  documents  retrieved  may  cause  a change  in  the 
information  being  sought,  sometimes  by  helping  to  identify  further 
constraints,  but  often  by  redirecting  the  search  to  different  areas  of 
interest.  In  order  to  expedite  the  perusal  process,  documents  must 
be  easily  viewed,  which  is  represented  by  the  term  presentation. 

Presentation 

Different  documents  require  different  programs  to  display  them. 
Once,  a long  time  ago,  almost  all  computerized  text  was  either  EBCDIC 
or  ASCII  encoded  characters  with  common  or  similar  carriage  control 
characters  embedded.  Indexing  such  text  was  straightforward,  and 
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almost  all  terminals  were  capable  of  displaying  the  text  in  a way 
which  was  meaningful,  albeit  not  necessarily  pretty. 

The  advent  of  windowing  systems  based  on  GUI  changed  this 
landscape  drastically,  as  competing  word-processing,  spreadsheet, 
and  drawing  packages  conceived  hundreds  of  different  file  formats 
for  their  respective  products.  The  ability  to  afford  a human  operator 
a simple,  effective,  graphic  interface  for  such  programs  almost 
always  came  at  the  expense  of  making  interactions  with  other 
programs  difficult  or  impossible,  except  through  exchanging  files.  As 
such,  most  cooperative  work  between  programs  was  based  on 
writing  and  reading  compatible  file  formats. 

As  such,  building  a TIRS  is  complicated  by  orchestrating  the  various 
viewers  in  such  a way  that  their  use  is  meaningful  in  the  framework 
of  successively  refining  a search.  Building  such  a framework  has 
proved  to  be  difficult,  as  the  many  releases  of  DDE,  OLE,  AppleEvents, 
etc.  have  shown.  Once  built,  however,  viewers  can  be  identified  as 
software  components  in  the  framework  and  invoked  when  needed  in 
a straightforward  manner.  As  cooperative  frameworks  become  more 
powerful  and  prevalent,  the  problems  associated  with  presentation 
will  diminish. 

Another  approach  to  solving  the  presentation  problem  is  to  create  a 
single  document  presentation  standard,  and  then  embed  the  viewing 
software  in  the  application.  Adobe's  Acrobat  is  just  such  a standard. 
Such  an  approach  is  attractive  as  it  permits  text  processing  vendors 
to  focus  on  text  processing  as  opposed  to  maintaining  code  for 
integrating  foreign  viewers.  The  success  of  such  an  approach 
probably  rests  upon  the  ability  of  Adobe  to  upgrade  and  improve 
Acrobat  features  and  presentation  quality  to  match  that  of  the 
products  from  independent  viewer  developers. 


Applications 

Incorporating  text  processing  functionality  into  applications  depends 
heavily  of  the  application  architecture  as  well  as  the  text  processing 
product  chosen.  Some  generalizations  can  be  made,  however. 

First,  the  presentation  fidelity  will  generally  determine  a great  deal 
about  the  application  architecture.  If  high-quality  presentation  is 
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required,  then  high-quality  viewers  must  be  invoke,  and  such 
viewers  are  typically  not  embeddable  into  arbitrary  application 
programs.  MS- Word,  for  instance,  is  not  embeddable.  However,  it  is 
messageable,  which  dictates  the  use  of  DDE  or  OLE.  Applications 
using  such  standards  must  have  certain  architectural  features,  such 
as  having  an  event  loop,  responding  to  certain  messages  and  being 
able  to  accept  callbacks.  Should  low-quality  presentation  be 
sufficient,  then  referent  text  rendered  in  ASCII  could  be  returned  as 
the  result  of  a text  query,  and  displayed  using  simple  ASCII  controls, 
placed  in  a file,  etc.  The  requirements  of  such  systems  are  vastly 
simplified  over  high-fidelity  presentation  systems.  So  the  rule  is: 
Increasing  presentation  fidelity  is  increasingly  expensive. 

Second,  most  applications  are  either  text  centric  or  text  enhanced. 
Text  centric  applications  are  written  around  the  text  search  API,  and 
tend  to  focus  on  whatever  programming  metaphor  the  selected  text 
engine  employs.  Other  items,  such  as  RDBMS  access,  are  secondary 
considerations.  Text  enhanced  applications  are  usually  legacy 
applications,  or  else  applications  where  the  text  processing 
functionality  clearly  represents  a minority  of  the  problem.  Help- 
desk applications  represent  a text-enhanced  application,  as  the 
application  can  clearly  be  authored  without  using  a text  engine, 
although  the  quality  of  the  application  increases  dramatically  when 
good  text  processing  capabilities  are  available. 

To  accommodate  this  difference,  various  text  processing  vendors 
have  differentiated  their  products  to  address  the  divergent 
requirements  of  each  approach.  Systems  such  as  Fulcrum  attempt  to 
make  text  processing  look  like  an  extension  of  RDBMS  programming, 
hoping  to  permit  client/server  applications  to  readily  include  their 
API  while  disturbing  program  structure  as  little  as  possible.  Other 
systems,  such  as  Excalibur,  use  a completely  separate  API  and  a 
substantially  different  metaphor,  and  as  such,  address  different 
markets.  Thus,  the  appeal  of  these  different  approaches  varies  as 
the  flexibility  of  the  application  architecture  and  the  demand  for 
presentation  fidelity. 

Third,  adding  text  search  to  an  application  brings  with  it  concepts  are 
features  which  must  be  administered  and  maintained.  Adding  a text 
engine  which  features  document  security  controlling  user  access  may 
mean  that  the  application  itself  must  be  upgraded  to  understand 
user  groups,  capabilities,  etc.  The  full  extent  of  such  changes  are 
rarely  properly  implemented  until  several  revisions  of  the 
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application  have  occurred.  As  such,  adding  text  processing 
capabilities  may  complicate  an  application  a much  as  adding  a 
graphic  user  interface  might,  not  based  on  structure,  but  based  on 
features. 

Given  these  generalizations,  one  can  reach  the  conclusion  that  the 
bulk  of  application  upgrades  will  introduce  minimum  text  processing 
functionality  in  their  first  release,  to  avoid  the  complexities 
associated  with  higher-fidelity  features.  As  understanding  increases, 
applications  will  evolve  to  present  greater  fidelity  text  processing. 
New  applications  will  be  written  to  the  same  standards  and 
constraints  as  they  were  before,  with  the  added  complexities  of  text 
processing  rolled  in  as  well.  Blanket  statements  about  the  suitability 
of  a particular  engine  for  all  applications  seem,  based  on  these 
generalizations,  to  lack  substance. 

Major  Application  Categories 

From  a technology  standpoint,  the  market  is  divided  into  three  main 
areas:  desktop  file  combing,  workgroup  and  departmental  text 

processing,  and  on-line  information  services. 

Desktop  Text 

Desktop  file  combers  attempt  to  help  users  locate  files  on  hard  disk 
based  on  the  file  name  yet  independent  of  the  directory  structure,  or 
by  searching  the  content  of  the  files  for  words  or  phrases.  Typically, 
they  perform  searches  on  whatever  the  current  popular  size  hard 
drive  might  be,  and  are  tailored  with  limited  search  engines,  and 
occasionally  don't  even  use  indices.  Examples  include  those  listed 
below: 


Alki  Software  Alki  Seek  2.1 
Claris  Retrieve  It  1.0 
Microlytics  Gofer  2.0 
On  Technology  On  Location  2.0 

They  are  distinguished  by  the  fact  that  they  assume  that  the  data  is 
local,  so  that  concurrency  control  (usually  implemented  with  file 
locking)  is  unnecessary,  that  the  cost  for  accessing  the  hard  disk  data 
is  negligible,  and  that  the  search  is  probably  the  most  important 
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thing  happening  on  the  machine  at  the  time,  and  so  can  consume  as 
much  CPU  as  needed. 

Another  important  facet  of  desktop  text  is  CD-ROM  search.  Such 
search  is  not  fundamentally  different  than  searching  a hard  drive, 
except  that  the  access  times  of  CD-ROM  tend  to  be  much  larger  than 
those  of  hard  drives,  often  slowing  the  search  process.  Careful  access 
scheduling  can  mitigate  much  of  the  performance  disadvantage,  but 
the  best  advantage  is  obtained  by  indexing. 


Workgroup  and  Departmental  Text  Search 

Workgroup  text  search  is  characterized  by  two  separate  technologies: 
shared  file  systems  and  client/server  operation. 

Shared  file  system  text  search  is  essentially  the  same  as  desktop 
search,  except  that  the  text  base  is  no  longer  local,  but  rather  is 
accessed  via  shared  file  system.  With  this  sharing  brings  the 

requirements  for  adhering  to  file  system  and  resource  security 
models,  potential  concurrency  bottlenecks,  and  interrupted  data 
access.  Bringing  the  entire  text  base  across  a network  connection  is 
infeasible,  so  all  workgroup  systems  must  be  indexed  in  some 
fashion. 

Client  server  operation  mitigates  many  of  these  problems  by 
performing  the  search  close  to  the  data,  and  moving  only  the 
minimum  required  data  from  server  to  client  to  satisfy  a query.  The 
penalty  for  client/server  operation  is  the  added  complexity  of 
installation  and  administration  compared  to  simple  shared  file  access. 
Examples  of  shared  file  systems  text  search  systems  include: 


Virginia  Systems  Software  Sonar  Professional  8.4 
Mainstay  MarcoPolo  2.0 
KnowledgeSet  GraphicsKRS  1.0 
Personal  Librarian  1.06 


Enterprise  and  Federated  On-Line  Information  Services 
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On-line  Information  Retrieval  Services  (OLIRS)  are  either 
client/server  or  mainframe  style  applications.  Such  services 
typically  must  provide  for  many  different  resource  usage 
measurements  to  facilitate  chargeback  accounting.  Examples  include 
connect  time,  difficulty  of  search,  breadth  of  document  search, 
number  of  words  or  pages  actually  transferred,  etc.  Such  services 
also  support  hundreds  or  thousands  of  concurrent  users  searching 
gigantic  text  bases.  Many  of  the  documents  referenced  from  a large 
text  base  may  be  stored  in  systems  which  are  currently  unavailable 
for  access  --  essentially  off-line.  In  addition,  different  text  services 
may  share  indices  to  data,  but  not  the  corpus  itself.  Systems  which 
share  text  indices  in  such  a manner  are  considered  federated.  As 
such,  flexibility  in  the  search  engines,  both  architectural  and 
operational,  is  necessary  to  implement  successful  large-scale  on-line 
systems. 

Large  text  bases  also  lead  difficulties  with  indexing.  On-line 
customers  expect  low  latency  between  the  arrival  of  information  to 
the  text  base  and  the  integration  of  the  information  into  the  index. 
As  such,  on-line  indexing  is  very  important  for  such  systems.  For 
very  large  text  bases,  the  incoming  data  rate  may  be  so  large  that 
off-line  indexing  is  simply  impossible.  The  tradeoffs  to  accommodate 
large  text  bases  are  different  than  those  for  small  text  bases,  just  as 
database  techniques  for  very  large  database  systems  are  quite 
different  than  those  of  DBASE-II.  The  integrity  of  the  index  becomes 
paramount  as  well,  as  the  loss  of  an  index  puts  a halt  to  all  searches. 

Some  example  of  engines  suitable  for  OLIRS  are: 

Apple  AppleSearch 

Blueridge  Technology  Optix  Network  NFS  4.0 
Excalibur  Technologies  PixTex/EFS  3.03 
Fulcrum  Technologies  SearchTools  1.2 
MicroDynamics  MARS  3.0 
Verity  Topic  3.1.4 


Overview  Summary 

This  section  has  attempted  to  provide  background  and  discussion 
concerning  the  technology  and  operational  frameworks  associated 
with  text  processing.  In  the  following  section,  we  briefly  examine 
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the  technology  of  Verity,  and  how  it  fits  into  the  operational 
frameworks  outlined  above. 

Success  Criteria 


To  understand  Verity's  position  in  the  text  processing  market,  it  is 
necessary  to  make  some  generalizations  concerning  that  market. 
Several  of  these  have  been  made  in  previous  sections,  but  are 
reiterated  here  and  specifically  applied  to  Verity. 

First,  most  desktop  text  systems  will  become  commodity  items,  in  the 
same  manner  that  spell  checking  programs  are  commodity  items. 
These  low-end  packages  employ  algorithms  which  are  simple  enough 
to  duplicate  and  place  in  every  application  rather  than  create  a 
separate  application  which  provides  a single  service.  It  is  possible 
that  text  search  will  become  a combination  of  operating  system  and 
associated  bundled  application  software  in  the  future,  as  Microsoft 
has  indicated  an  intent  to  pursue  such  an  approach  in  Cairo.  Indeed, 
the  UNIX  operating  system  traditionally  shipped  with  a 
comprehensive  dictionary,  thesaurus,  spell  checker,  and  text 
formatting  tools  as  part  of  the  standard  distribution,  with  later 
additions  including  a reading  level  evaluation  and  diction  tool.  In 
general,  however,  the  function  of  spell  checking  become  an  OEM 
problem,  or  else  was  implemented  in  a quick  and  dirty  fashion, 
forcing  many  of  the  spelling  checkers  off  the  market. 

Creeping  featurism  in  operating  systems,  application  frameworks 
such  as  OLE,  and  technology  consortiums  such  as  Shamrock,  OpenDoc, 
and  DCE  will  eventually  provide  most  of  the  tools  which  are 
necessary  for  building  efficient  text  search  engines  well  integrated 
with  the  underlying  technology.  The  quality  and  fidelity  of  these 
systems,  however,  will  be  suspect,  as  there  must  be  some 
commercially  viable,  reputable,  and  persistent  organization  providing 
support  and  upgrades  for  anyone  to  seriously  consider  relying  upon 
such  technology.  A good  example  is  the  GNU  projects  C compiler.  By 
most  accounts,  it  represents  a high-quality,  very  flexible  and  reliable 
compiler,  for  much  of  its  life  providing  functionality  and 
performance  exceeding  that  of  many  hardware  and  compiler 
companies'  releases. 

Most  organizations,  especially  large  ones,  refuse  to  invest  in 
unsupported  software.  To  answer  that  need,  companies  such  as 
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Cygnus  were  created  which  provide  contract  support  for  the  free 
software  released  by  the  GNU  project.  While  acceptance  improved,  it 
never  reached  the  level  of  usage  for  commercial  projects  that 
hardware  vendor  and  compiler  companies  reached  with  their 
products. 

To  be  successful  then,  a text  search  engine  company  must  establish 
strong  OEM  ties  with  both  OS  and  application  vendors,  and  maintain 
those  ties  by  continually  upgrading  their  product.  Their  pricing  must 
be  cheap  enough,  and  their  products  of  sufficient  quality  and 
performance,  to  keep  OEM  partners  from  switching  to  other 

suppliers  or  inventing  their  own  solution. 

To  be  successful  then,  a text  processing  company  will  need  to  exploit 
a single  facet,  or  at  most  two  facets,  of  the  spectrum  of  text 
processing.  The  two  most  likely  candidates  are  data  capture  and 

search.  Storage  is  not  viable  because  companies  such  as  Microsoft 
and  its  competitors  need  to  add  storage  support  features  to  their 
operating  systems  to  show  continuous  product  improvement.  This  is 
not  to  say  that  a Microsoft  solution  is  the  only  one.  Rather,  the 
market  for  storage  improvement  will  be  vastly  diminished  by  the 
ready  availability,  and  most  of  the  improvement  in  the  storage 
performance  and  features  lies  in  improving  the  index  methods  in  any 
event,  which  lies  in  the  realm  of  data  capture  and  search. 
Presentation  is  a lost  cause,  as  all  the  application  frameworks  are 
attempting  to  solve  this  problem,  and  the  influential  partnerships 
have  already  been  created.  There  is  opportunity  in  application 

development,  but  such  applications  will  be  mostly  end-user 
application  development,  such  as  Fortune  1000  in-house  application 
development,  add-ins  to  application  development  systems  such  as 
PowerBuilder,  and  simple  front-ends  for  client-server  systems. 

A real  threat  exists,  however,  in  the  few  large  companies  which 
control  the  application  infrastructure  of  the  Fortune  1000,  such  as 
Oracle  and  Microsoft.  By  treating  text  processing  as  another  facet  of 
client/server  database  computing,  the  highly  attractive  proposition 
that  text  search  can  be  added  with  a minimum  of  retraining  and 
application  rearchitecture  is  presented.  For  many  applications,  such 
an  approach  is  in  fact  feasible,  although  as  the  application  grows  in 
size  and  fidelity,  more  and  more  difficulties  will  arise. 

During  customer  interviews,  one  report  noted  that  a company  which 
had  been  using  one  engine  was  probably  going  to  convert  to  another. 
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This  decision  was  based  on  the  fact  that  the  text  engine  company  had 
decided  to  used  Microsoft's  ODBC  database  interface  for  its  text 
processing  interface.  While  the  text  engine  vendor  saw  this  as  a 
great  savings  in  code  and  maintenance  costs,  and  a step  towards 
standards  compliance,  the  customer  saw  it  as  a rather  gratuitous 
change  which  caused  application  rearchitecture  and  induced  a 
performance  penalty.  The  lesson  here  is  that  the  API’s  and 
application  architectures  supported  at  first  release  are  very  difficult 
to  substantially  alter  at  a later  date. 

Finally,  some  companies  will  fail  because  they  try  to  do  it  all,  and 
don't  do  each  section  well.  The  advance  of  Adobe  Acrobat  makes 
significant  investments  in  presentation  components  unwise. 
Companies  such  as  Oracle  will  attempt  to  fit  each  facet  of  text 
processing  into  their  currently  existing  products,  innovate  where 
there  are  holes,  and  generally  provide  a level  of  fidelity  suitable  for 
corporate  applications.  In  addition,  they  are  no  doubt  investigating 
coupling  their  text  engine  with  their  video  server  technology  to 
provide  TIRS  interfaces  over  video  networks.  Combating  such 
infrastructure  giants  is  certain  doom. 

The  areas  where  smaller  players  can  capitalize  and  succeed  are 
search  and  capture.  The  investment  in  search  algorithms, 

accommodating  references  to  the  myriad  file  formats  and  the 
emerging  Acrobat  documents,  relational  database  access  and 
referencing,  and  security  models  for  text  processing  are  all  areas 
which  need  be  authored  only  once  and  are  substantial  investments 
for  companies  focused  on  providing  a larger  added  value.  Products 
which  can  fit  into  the  various  application  frameworks  will  succeed 
whereas  standalone  systems  will  fail. 


Strengths  and  Challenges  for  Verity 

Verity's  strengths  lie  mostly  in  their  new  architecture  and  their  focus 
on  query  processing,  which  promotes  longevity  based  on  the  ideas 
presented  above.  While  their  "topic"  technology  is  sure  to  be 
supplanted  with  superior  technology  in  several  years,  they  are  in  a 
good  position  to  capture  a reasonable  portion  of  the  OEM  market, 
based  upon  their  alliances  with  Adobe  and  HP.  A closer  relationship 
with  Microsoft  would  not  hurt,  but  is  not  necessary  for  success. 
Verity  has  remained  focused  on  search  and  capture,  and  has  a 
superior  architecture  for  OEM  embedding  than  their  competitors,  due 
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to  their  early  adoption  of  a multithreaded,  reentrant  program.  The 
size  of  their  engine  is  also  attractive  for  OEMS  (-600KB),  and  will  not 
necessarily  require  client  server  operation,  permitting  a single 
computer  to  be  used  as  well  as  addressing  multiprocessors  (through 
process-per-processor  client  server.) 

In  addition.  Verity  has  made  the  corporate  transition  from  a $30K 
product  to  a $1K  product,  and  can  by  careful  planning  and  partnering 
accommodate  even  lower  price  points. 

There  are  many  challenges  as  well,  however.  Search  technologies 
such  as  LSI  (Latent  Semantic  Indexing)  and  similar  techniques  will 
provide  considerable  competition.  In  a market  where  missing  even 
one  relevant  document  might  mean  success  or  failure,  sensitivities  to 
query  precision  and  algorithms  will  continue  to  increase.  Verity 
must  make  sure  that  their  API's  and  kernel  architecture  can  survive 
the  addition  of  or  conversion  to  a very  different  search  model. 
Companies  such  as  Excalibur  are  probably  better  positioned  to  take 
advantage  of  LSI  and  its  successors  than  Verity  at  this  time. 

In  terms  of  addressing  the  existing  client/server  market,  the  two 

companies  to  beat  are  Fulcrum  and  then  Oracle.  Fulcrum  has  placed 
its  bet  on  marketing  text  processing  as  a database  problem,  and  have 
provided  enhanced  versions  of  SQL  to  accommodate  text.  Oracle  has 
done  the  same  thing,  but  will  integrate  the  extensions  across  its 
product  line.  Fulcrum  has  approached  most  RDBMS  vendors  to 

provide  similar  enhancements  using  its  dialect  or  variants  suitable 

for  the  RDBMS  vendor.  In  any  event.  Fulcrum  has  the  lead  in  market 
share  and  has  created  the  perception  that  text  is  a server  problem. 
Verity  must  find  a way  to  either  fit  or  change  the  perception. 

Finally,  overall  performance  will  be  a key  differentiator. 
Benchmarking  text  retrieval  systems  is  difficult  as  the  criteria  for 
success  depend  heavily  on  the  application.  Clearly,  however,  the  best 
performance  increases  will  come  from  improving  the  search 

algorithms,  then  by  tuning  the  search  engine  to  best  utilize  the 
storage  engine  (such  as  aligning  text  on  double-word  boundaries, 
etc.) 


Conclusions 
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The  entire  text  processing  industry  will  sustain  a massive  shakeout 
once  established  players  start  offering  their  solutions.  Verity  must 
maintain  partnerships  to  ride  the  shakeout.  Their  partners  will 
succeed  based  on  Verity's  ability  to  improve  their  search  technology 
without  revolution.  Partnerships  with  presentation  companies  are 
crucial  for  success.  Verity  must  establish  some  defense  against 

server-centric  assaults  from  Oracle  and  Fulcrum,  else  they  risk  being 
disregarded  by  the  RDBMS  community.  The  underlying  technology  of 
"topics"  provides  as  good  a model  as  most  for  constraining  searches, 
and  does  not  represent  a major  advantage  or  disadvantage  at  this 
time.  Continued  development  of  query  processing  and  optimization 
techniques  is  vital.  To  address  on-line  information  services, 

partnerships  with  natural  language  understanding  systems  will  need 
to  be  established  within  12-18  months  (and  are  currently  being 
investigated.) 
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BY  KRISTI  COALE 


ot  long  ago,  finding  documents 
with  information  you  needed 
meant  looking  through  bankers 
boxes  full  of  musty  file  folders, 
leafing  through  the  Reader  s Guide 
to  Penodtcal  Literature,  or  making 
yourself  queasv  skimming  rolls  of 
microfiche  just  to  find  two  paragraphs 
relevant  to  your  search. 

Fortunately,  recapturing  the  past  is 
easier  these  davs,  thanks  to  special  software 
for  archiving  and  retrieving  text.  The 
pnmary  function  of  text-archiving-and- 
retrieval  software  is  to  store  text  on  your 
computer  and  to  find  it  for  you  later.  The 
most  basic  text-retrieval  packages,  such  as 


Claris’s  Retrieve  It,  Alki  Seek,  Microlytics’ 
Gofer,  and  On  Technology’s  On  Loca- 
tion, respond  to  your  typing  in  a word  or 
several  words — -jazz  and  Charlie  Parker, 
for  example — by  lisong  all  files  on  the  vol- 
ume you’re  searching  that  contain  those 
words.  These  packages  are  well  suited  for 
individuals  who  must  often  dredge  up  half- 
remembered  facts  or  deeply  nested  files. 

But  many  retrieval  packages  go  far 
beyond  simply  finding  a few  words.  The 
high-end  packages,  as  well  as  On  Loca- 
tion, preindex  text  or  store  it  in  an  ar- 
chive— a text  database — to  speed  up  the 
search.  Many,  including  Blueridge  Tech- 
nology’s Optix  Network  NLS  (Natural 
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Language  System),  Apple’s  AppleSearch, 
and  xMainstay’s  MarcoPolo,  are  designed 
for  network  access  to  an  organization’s 
data;  in  fact,  some  products,  such  as  Ver- 
ity’s Topic  and  Excalibur  Technologies’ 
PixTcx/EFS  (Pictures  and  Teit/Elec- 
tronic  Filing  System),  even  let  Mac  clients 
access  data  stored  on  a non-Mac  server. 
And  others,  including  KnowledgeSet’s 
GraphicKRS  (Knowledge  Retrieval  Sys- 
tem) and  Fiilcrum  Tecnnologies’  Search- 
Tools,  actually  include  programming 
tools  for  designing  a custom  interface, 
which  is  espjedally  useful  for  distributing 
your  organization’s  archives — or  even 
selling  information — on  CD-ROM, 

Advanced  Search  Features 

A simple  word  search  can  return  a list  of 
documents  so  long  you’ll  wish  you  were 
back  at  the  microfiche  reader.  Boolean 
operators — AND,  OR,  and  NOT — and 
proximity  operators  are  useful  for  zero- 
ing in  on  the  texL  For  instance,  a search 
for  Jazz  AND  Monk  WITHIN  10  New 
York  yields  documents  containing  both 
Jazz  and  Monk,  but  only  if  these  two 
words  are  within  ten  words  of  New  York. 

Wild-card  searching  and  fuzzy 
searching,  which  looks  for  approximate 
spellings,  come  in  handy  when  you  don’t 
know  the  exact  spelling  of  a word,  or  if 
you’re  looking  for  related  words.  For 
instance.  Bio  ' (the  asterisk  is  a common 
wild-card  symbol)  might  find  Biology,  Bio- 
chemistry, and  Biotechnology. 

Some  high-end  packages,  such  as 
.\DpleSearch,  Personal  Library  'itw'are’*^ 
Personal  Libranan,  and  Oprix  x'Tetwork 
NLS,  provide  a natural-language  search 
engine,  so  you  can  type  what  you  want  to 
find  in  plain  EngUsh,  If  you  look  for  the 
phrase  “articles  about  modern  music,”  the 
program  might  return  files  with  informa- 
tion about  the  Spin  Doctors,  jazz,  and  Bran- 
ford xMarsalis,  Natural-language  searching 
understands  relationships  between  words, 
either  by  looking  them  up  m a built-in  dic- 
tionary and  thesaurus,  or  by  analyzing  the 


contents  of  documents  and  finding  words 
that  frequently  occur  together. 

Even  with  all  these  search  tools,  you 
can  wind  up  with  a lengthy  list  of 
retrieved  files.  If  you  didn’t  create  the  files 
yourself,  you  have  no  way  of  knowing 
which  make  only  passing  reference  to  the 
search  term  and  which  are  usefiiL  Fortu- 
nately, some  natural-language  systems 
figure  this  out  for  you  and  rank  the  files 
according  to  their  relevance. 

Finding  Your  Search  Tool 

Searching  for  a text-retrieval  package  can 
be  about  as  complicated  as  searching  for 
documents,  so  decide  what  you  want  to 
accom  sh  before  yiu  choose  a package. 

F " instance,  if  you  have  an  80MB 
hard  cusk  and  your  filing  system  is  totally 
disorganized,  a basic  retrieval  package  can 
help  you  find  lost  files  more  easily.  Alki 
Seek,  Remeve  It,  Gofer,  and  On  Location 
are  geared  for  this.  All  four  packages  pro- 
vide the  basic  Boolean  operators,  but 
beyond  that  there’s  a big  gap  in  ofiFerings. 

Retrieve  It  has  the  most  basic  set  of 
search  tools:  Booleans  and  proximity 
searching.  However,  it  does  provide  a 
smorgasbord  of  proximity  searches,  letting 
you  look  for  words  before  or  after  other 
words,  for  example.  It  can  pause  a search 
in  progress  or  search  in  the  baciiground. 

(^fer  is  slightly  better,  letnng  you 
search  for  up  to  eight  words  separated  by 
the  Boolean  and  proximity  operators  and 
adding  wild-card  searches.  You  can  search 
in  the  background  with  Gofer  and  search 
for  near  spellings.  However,  sir  ce  Go  * 
doesn’t  run  on  the  68040  processor  and 
xVlicrolytics  has  no  plans  to  update  the 
program,  avoid  Gofer  if  you  own  or  plan 
to  buy  a newer  Mac. 

Gofer  and  Retrieve  It  are  outdone  by 
Alki  Seek.  Seek  offers  more  text-search 
criteria  than  you  ever  thought  possible.  Its 
well-designed  query  window  won’t  over- 
whelm vou,  and  its  Banter  Box  describes 
in  plain  English  what  you’re  searching 
for.  You  can  save  and  reuse  your  search 


criteria,  and  Seek  also  lets  you  open  up  a 
document  and  view  it  in  its  original  for- 
mat— Seek  supports  XTND  as  well  as 
formats  of  major  programs  like  Word  and 
PageMaker.  (Retrieve  It  and  Gofer  can 
show  the  text  of  foimd  documents  with- 
out formatting.)  However,  Seek  can’t 
search  in  the  background,  and  it  doesn’t 
have  relevance  ranking.  ^ . 

These  three  paclmges  are  painfully 
slow  for  large  amounts  of  text  because 
they  have  to  read  through  the  actua' 

In  an  informal  test  on  a Pow  Book  i4v* 
I searched  for  Macmtosb  ANl  Centris  in 
roughly  377.MB  of  text  from  back  issues 
oiMc  .li.  See  k took  about  23  n 

Gofe  cook  neari)  30  minutes — and  Ke- 
trieve  It  had  the  sense  to  quit  after  20  min- 
utes and  chastise  me  for  my  search  criteria. 

Programs  that  index  all  the  ter*  ^id 
then  search  the  index  instead  of  die  orig- 
inal documents  are  much  faster.  On  Loca- 
tion performed  the  same  infcmnal  test  m 
516  minutes.  Indexing  saves  searching 
time,  but  it  costs  hard  disk  sp^ce:  son., 
programs’  indexes  are  about  1 percent  of 
the  size  of  the  data  you’re  indexing,  buc 
others  can  be  100  percent 

On  Location  has  a nice  feature — it 
can  index  in  the  background  and  update 
the  index  automar  illy  when  you  modify 
your  files.  On  L ition’s  indexing  out- 
weighs the  sparse  i ofse^’ch  criteria  and 
lack  of  relevance  ranking  and  makes  On 
Location  the  best  basic  package,  though 
AUd  Seek  should  be  upgraded  with  index- 
ing and  background  searching  by  the  time 
you  read  this  (see  “Fast  Find”). 

Sharing  Your  Knowledge  Base 

Text-archi\ing-and-retrieval  tools  really 
come  into  their  own  when  your  work- 
group or  company  can  share  information 
on  a Mac  nerw'ork.  Virginia  Systems  Soft- 
ware’s Sonar  and  Sonar  Professional, 
MarcoPolo,  GraphicKRS,  and  Personal 
Librarian  make  it  easy  to  do  this. 

Both  Sonars  have  the  standard  Bool- 
ean search  opierators,  fuzzy  and  pronmi- 
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MarcoPolo's  Picture  File  MarcoPolo's  special 
pnnt  driver  archives  an  image  of  your  Mac-generat- 
ed files  along  with  the  text  they  contain,  so  you  can 
see  thumbnails  of  retrieved  files  while  you  read  them. 

PICT  image  of  documents  in  the  database 
as  it  archives  them,  and  it  can  do  sub- 
queries — that  is,  query  the  list  of  files 
found  by  a previous  search.  But  Marco- 
Polo  lacks  natural-language  searching  and 


Relevant  Details  Personal  Libranan  has  ranked 
this  group  of  27  files  by  their  relevance  to  the  BCCi 
banking  scandal,  and  the  histogram-like  display 
below  shows  how  relevant  each  file  is. 

relevance  ranking,  and  is  not  program- 
mable (see  “MarcoPolo’s  Picture  File”). 

MarcoPolo  has  a nice  feel  and  is  very 
easy  to  set  up,  thanks  to  straightforward 
manuals.  Still,  the  nod  for  these  packages 


goes  to  Personal  Librarian,  because  of  its 
natural-language  querying  and  relevance 
rankmg  (see  “Relevant  Details”).  Second 
place  goes  to  Sonar  Professional,  despite 
confusing  manuals  that  make  setting  up 
the  program  difficult,  because  Sonar  Pro 
has  some  great  extras,  such  as  sticky  notes 
that  you  can  add  to  documents  and  that 
become  part  of  the  mdex  for  searching  as 
well.  In  my  informal  speed  test.  Sonar 
took  a bit  under  three  minutes,  and 
MarcoPolo  took  just  under  four. 

Serving  Up  Information 

For  very  demandmg  situations,  you  need 
to  look  at  client-server  systems.  There  are 
three  client-server  systems  that  can  be 
all-Mac — Opcix,  MARS  (Microdynamics 
Archiving  and  Retrieval  System),  and 
AppleSearch.  In  several  others  the  Mac’s 
role  is  only  as  a client  on  a VMS,  Unix, 
OS/2,  or  mainfi^me  server. 


1 CLIENT/SERVER  SYSTEMS 
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Librarian  1.06 

AppleSearch 

Optix  Network 
NLS4.0 

PixTex/EFS  3.03 

SearchTools  1.2 

MARS  3.0 

Topic  3.1 .4 

Personal  Library 
Software 

Apple  Computer 

Blueridge  Technology 

Excalibur  Technologies 

Fulcrum  Technologies 

Micro  Dynamics 

Verity 

301/990-1155 

408/996-1010 

703/675-3015 

619/625-7900 

613/238-1761 

301/589-6300 

415/960-7600 

0 

800/635-9550 

O 

800/788-7758 

800/385-2786 

O 

O 

$995 

starts  at  $1799 

$75,000 ' 

starts  at  $ 1 2 ,000  * 

starts  at  $1000  per  user  • 

starts  at  $70,000  * 

starts  at  $30,000 ' 

• 

• 

• 

• 

• 

• 

9 

• 

o 

• 

• 

• 

• 

9 

• 

• 

• 

• 

• 

• 

9 

0 

o 

• 

• 

o 

o 

9 

• 

• 

• 

o 

• 

• 

9 

0 

o 

• 

• 

o 

o 

9 

• 

• 

• 

• 

• 

9 

9 

• 

• 

• 

o 

o 

'w' 

9 

• 

• 

• 

o 

0 

o 

O 

o 

o 

• 

o 

• 

9 

9 

o 

• 

• 

• 

O 

9 

• 

• 

• 

o 

• 

- 

9 

o 

• 

o 

• 

o 

9 

9 

O 

9 

9 

9 

9 

9 

O 

60%  of 
onginal  data 

100%  of 
original  data 

2K  per  page 

'A  of  original  data 

85%  of  original  data 

100%  of 
original  data 

65%-70%  of 
original  data 

9 

9 

9 

- 

9 

9 

9 

9 

9 

9 

9 

9 

- 

9 

System  7 file 
shanng,  Unix, 
VMS  servers 

Mac  OS. 
A/UX  servers 

Unix  server 

Unix.  VMS  servers 

internal,  Unix, 
OS/2  servers 

Mac  OS  server 

DOS,  Unix, 

VMS,  OS/2  servers 

O 

O 

9 

9 

9 

9 

9 

Doesn’t  create  index  or  archive. 
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ty  searching,  wild-card  searching,  and  rel- 
evance ranking.  Sonar  Pro  can  also  search 
phonetically — by  finding  words  that 
sound  like  the  search  terms — and  lets  vou 
pause  in  the  middle  of  a query,  conduct 
another  search,  and  then  resume  the  pre- 
vious search.  GraphicKRS  also  has  this 
branching  search  capability'. 

GraphicKRS  moves  the  text  to  be 
searched  into  a separate  archive.  Tools  for 
programmers  let  \'Ou  build  a complete 
application  around  this  archive — for 
example,  you  can  build  an  interface  and 
set  up  hypertext  links  between  parts  of 
documents.  GraphicKRS  also  imports 
graphics  embedded  in  documents  mto  the 
archive.  Sonar  Pro  has  a hypertext  fea- 
ture but  isn’t  programmable.  Both  pack- 
ages record  the  current  session’s  search 
path,  so  you  can  easily  retrace  vour  steps. 

GraphicKRS  has  wild-card  and 
Boolean  searching,  but  Sonar  Pro  adds 


concept  searching — a technique  that  is 
similar  to  natural-language  searching  but 
requires  the  user  to  define  the  relation- 
ships between  words.  Concept  searching 
makes  Sonar  Pro  a better  retrieval  tool 
than  GraphicKRS,  but  Sonar  Pro  is  out- 
done by  Personal  Librarian’s  natural- 
language  query  capability.  Personal 
Librarian  looks  for  statistical  relation- 
ships between  words  in  the  documents  it 
retrieves.  For  example,  if  vou  tell  it  to 
look  for  baseball,  it  will  noQce  that  many 
of  those  documents  contain  the  word 
umpire  and  mav  start  to  turn  up  docu- 
ments containing  the  word  umpire  that 
don’t  mennon  baseball.  Personal  Librar- 
ian also  has  fuzzy  and  wild-card  searches, 
and  ranks  the  files  it  finds  for  you. 

MarcoPofo  hands  out  from  this 
group  for  several  reasons.  The  Sonars, 
GraphicKRS,  and  Personal  Librarian  use 
System  7 file  sharing  to  run  on  a network. 


Fast  Find  On  Location's  limited  search  critena 
make  it  hard  to  filter  out  files  that  are  marginally  rel- 
evant to  your  query,  but  its  index  does  make  retneval 
very  rapid. 

and  they  don’t  track  who  owns  docu- 
ments. MarcoPolo  has  its  own  network- 
ing scheme  with  built-in  access  manage- 
ment and  document  tracking.  Through  a 
special  print  driver,  AlarcoPolo  includes  a 


Text-Archiving  Systems  at  a Glance 

SMALL  SYSTEMS 1 MULTIUSER  SYSTEMS 

Sonar 


Alki  Seek  2.1 

Retrieve  It  1 .0  Gofer  2.0 

On  Location  2.0.1 

Professional  8.4 

MarcoPolo  2.0 

GraphicKRS  1.0 

CMMral 

Company 

Alki  Software 

Claris 

Microlytics 

On  Technology 

Virginia  Systems 
Software 

Mainstay 

KnowledgeSet 

Phone 

206/286-2600 

408/727-8227  716/248-9150 

617/374-1400 

804/739-3200 

805/484-9400 

408/738-3400 

Toll-free  phone 

800/669-9673 

800/325-2747  O 

800/548-8871 

O 

O 

800/456-0469 

List  price 

S39  95 

569 

$39  95 

$129 

5795* 

$395 

$15,000* 

ChMry  Mathedi 

Boolean 

• 

• 

• 

• 

• 

• 

• 

FuzTy 

Q 

O 

• 

• 

• 

• 

O 

Proximity 

O 

• 

• 

O 

• 

O 

• 

Phonetic 

O 

o 

0 

O 

• 

O 

o 

Wild  card 

• 

o 

• 

• 

• 

• 

• 

Pattern 

• 

o 

o 

o 

o 

c 

Keyword 

• 

• 

• 

• 

• 

• 

• 

Root  search 

o 

o 

o 

o 

• 

o 

o 

Natural  language 

o 

o 

o 

o 

o 

o 

Concept 

o 

o 

o 

o 

• 

• 

o 

Subquery 

c 

c 

o 

0 

• 

• 

Save  query 

• 

o 

• 

o 

• 

• 

• 

Autoquery 

• 

o 

c 

o 

- 

• 

o 

1 

Other  RMtarM 

Automatic  index  and/ 
or  archive  update 

NA<: 

NA<= 

< 

Z 

• 

• 

o 

Space  requirements 
for  index/archive 

NA<^ 

NA'^ 

NA<^ 

2%-4%  of 
onginal  data 

60%-100%  of 
original  data 

1 % of  onginal 
data 

400%  of 
original  text 

Background  searching 

• 

• 

• 

• 

Relevance  ranking 

o 

• 

O 

• 

C 

• 

Multiuser  support 

System  7 
file  shanng 

System  7 
file  shanng 

System  7 
file  sharing 

System  7 
file  shanng 

System  7 
file  sharing 

internal 

System  7 file 
sharing,  internal. 
Unix  server 

Imaging  capability 
available 

O 

O 

O 

O 

Sonar  Image 
(separate  product) 

• 

O 

• = yes.  O = no:  NA  = 

not  applicable.  ‘ A 5295  version. 

Sonar  8 4,  lacks  phonetic  and  root-search  queries,  and  It  cannot  save  queries. 

* Price  vanes  with  configuration  and  license 
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HOW  TO  MANAGE  PAPER  DOCUMENTS 


Searching  text  files  is  fine  when  your  data  is  on  the  computer,  but  chances  are 
you  have  a mountain  of  papers  stored  away  too.  If  your  organization  neeos  to 
convert  paper  files  to  searchable  computer  format,  you  might  consider  a document- 
imaging  system.  A document-imaging  system  scans  paper  files  and  lets  you  add 
keywords  to  the  scans  to  facilitate  retrieving  the  documents  later;  when  you  con- 
duct a search  the  system  displays  the  image  of  the  original  document.  Most  docu- 
ment-imaging packages  use  OCR  software  to  convert  scanned  documents  into 
searchable  text  (see  "OCR;  The  Recognition  You  Deserve,"  Macworld.  November 
1993).  Optix  Network  NLS  and  MARS,  among  other  products  in  this  article,  are 
geared  toward  document-imaging,  but  the  topic  requires  an  article  to  itself,  so  look 
for  more  coverage  in  an  upcoming  issue  of  Macworld. 


Optix*  Natural  Language  Search 
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Searching  in  English  Optix's  Natural  Language 
Search  lets  you  fine-tune  how  It  interprets  your  query 
terms.  In  this  example,  Optix  provides  a list  (under 
Definitions/Expansions)  of  nuances  associated  with 
the  search  term  RUN. 


Matching  Patterns  PixTex/EFS  looks  for  strings 
of  letters  instead  of  entire  words,  so  it  can  find 
Charlemagne  even  though  the  search  term  is  mis- 
spelled as  charlamain. 

These  systems  are  for  laree  orgamza- 
tions  with  many  users  who  need  simulta- 
neous access.  For  instance,  a company 
might  have  a help  desk  that  receives  calls 
about  its  products  and  needs  to  track  prob- 
lems and  make  the  solutions  accessible  to  its 
techniaans.  As  records  of  calls  build  up,  a 
wealth  of  information  accumulates  in  the 
retrieval  system  so  that  a technician  can 
type  m a description  of  a problem  and  get 
references  to  previous  solutions. 

■AppleSearch  is  based  on  Personal 
Librarian  s search  engine,  so  it  incorpo- 
rates natural-language  searching.  Like 


Personal  Librarian^  AppleSearch  saves 
quenes,  but  it  goes  a step  further:  you  can 
set  an  AppleSearch  query  (called  a 
Reporter)  to  run  automatically  on  a sched- 
ule— for  example,  to  search  information 
downloaded  from  the  wire  services  once  a 
day.  Results  are  ranked  by  relevance. 

Optix  Network  NLS  can  run  as  a sin- 
gle-user system  but  is  designed  for  Mac- 
intosh clients  on  a Unix  server.  Optix’s 
natural-language  engine  is  a little  dif- 
ferent from  AppleSearch’s  or  Personal 
Librarian’s — it  uses  a dictionary'  and  a 
thesaurus  to  analyze  the  words  in  a query 
and  draw  relationships  between  them. 
You  can  control  how'  sensitive  Optix  is  to 
finding  these  relationships  (see  “Search- 
ing in  English”). 

Verit\'’s  Topic,  wdth  a Mac  client  that 
can  access  servers  on  several  platforms,  is 
for  people  who  have  a good  idea  what 
they  are  looking  for  and  how'  to  ask  for  it. 
Topic  uses  concept  searching,  so  you  (or 
an  expert  in  the  subject  you  are  investi- 
gating) must  define  the  quenes,  or  topics, 
by  setting  up  hierarchical  relationships 
between  search  words  and  weighting  each 
part  of  the  query.  As  you  add  new  sub- 
jects and  words  to  a query  you  can  change 
the  relationships  to  reflect  any  new 
emphasis  vou  might  want  to  underscore. 
Topic  ranks  finds  in  order  of  importance. 
Verity  also  sells  a module,  called  Topic 
Real-Time,  that  searches  real-time  data 
feeds  such  as  news  wires. 

Excalibur’s  PixTex/EFS  runs  as  a 
.Mac  client  on  Unix  and  \’^MS  servers. 
PixTex  specializes  in  pattern  recognition, 
searching  at  a fine  resolution  that  ignores 
words  and  looks  for  letters — it  compares 
every  sequence  of  characters  in  your 
querv'  against  every  sequence  of  charac- 
ters in  the  documents.  This  makes  pho- 
netic and  fuzzv  searching  unnecessary. 
You  can  even  t\q)e  in  approximate 
spellings  of  words,  and  the  program  finds 
anvT:hing  having  similar  sequences  of 
characters  and  ranks  them  by  relevance 
(see  “Matching  Patterns”). 


.Micro  Dynamics’  ALARS  is  a docu- 
ment imaging  system  with  a text-search 
module.  FreeForm,  that  would  be  im- 
pressive as  a stand-alone  program  but  is 
not  sold  separately.  Alicro  Dynamics 
started  out  selling  turnkey  systems,  com- 
plete with  a serv'er,  storage,  and  Macs 
with  preinstalled  software,  though  the 
company  now  sells  the  software  without 
hardware.  .\LARS's  FreeForm  offers  up 
to  eight  levels  of  nested  search  terms 
and  can  perform  automatic  searches  that 
are  similar  to  those  of  AppleSearch’s 
Reporters. 

Fulcrum’s  SearchTools  is  a program- 
mable system,  like  GraphicKRS,  that  is 
really  designed  for  developing  in-house 
te.xt-retneval  s\'stems  and  searchable  CD- 
ROMs.  SearchTools  adds  a new  wrinkle: 
it  incorporates  a version  of  SQL,  the  um- 
versal  database  query  language,  that  has 
been  extended  with  commands  specific 
to  text-retneval  operations.  SearchTools 
runs  as  a Macintosh  client  on  Unix  or 
OS/2  servers  (a  Windows  NT  server  is 
also  in  development). 

With  such  a wide  variety  of  search 
tools  available,  there  is  certain  to  be  one 
that  can  help  you  find  all  the  information 
you  need.  Now'  if  someone  would  just 
hurry  up  and  convert  all  those  reams  of 
microfiche  into  digital  teiL  n 


KRISTI  COALE,  based  in  Redwocxl  City,  California, 
wishes  text-retneval  technology  had  been  available 
sooner  to  save  her  from  years  of  microfiche 
sickness. 


TEXT-RETRIEVAL  SOFTWARE 


MW 


EDITORS' 

CHOICE 


The  client  server  tools  are  too 
complex  for  us  to  make  a call, 
but  among  products  geared  for  the  casual  user 
and  for  workgroups,  two  products  stand  out 


Small  System 

On  Location  On  Location  is  very  fast  and 
barely  requires  maintenance,  because 
once  you  create  an  index  it  updates  auto- 
matically every  time  you  modify  or  aeate 
documents  Company:  On  Technology. 
List  price:  $129 


Multiuser  System 

Personal  Librarian  Personal  Librarian’s  statis- 
tics-based natural-language  querying 
helps  you  find  not  only  what  you  know 
to  search  for.  but  also  related  topics. 
Company:  Personal  Library  Software. 
List  price:  $995 
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GartnerGroup 

Continuous  Services 


SearchTools 

Fulcrum  Technologies  Inc. 
Ottawa,  Ontario 
(613)  238-1761 


Rgure  1 

SearchTools  Architecture 

Applications 


SearchTools 

Windows 
Visual  Basic 
Unix 

SearchSQL 

/ 

SearchServer 

Table  Full-text  Native 
Index  Format 
Document 


Source:  Fulcrum,  Gartner  Group 


Office  Informatbn 
i Systems 


Products,  P-FUL-1114 
J.  Popkin 


Fulcrum’s  SearchTools: 


OIS 

Research  Note 

March  22, 1993 

REPRINT 

New  Era  in  Text  Retrieval 


( 


The  era  of  stodgy  text-retrieval  applications  is  over. 
Fulcrum’s  SearchTools  will  bring  text  retrieval  to  the  wide 
world  of  SQL  application  developers.  SearchTools  are  easy 
to  use  and  priced  for  wide  distribution. 


Fulcrum  Technologies  Inc.’s  March  22,  1993,  announcements 
not  only  introduce  a new  product,  but  also  mark  a fundamental 
change  in  Fulcrum’s  marketing  and  distribution  strategy. 
SearchTools,  a family  of  SQL-based  text-retrieval  application 
development  tools,  will  catapult  Fulcrum’s  Ful/Text  technology 
into  the  mainstream  of  corporate  application  development.  The 
long-term  benefit  to  developers  and  users  is  obvious:  the  rapid 
development  of  powerful  and  inexpensive  text-retrieval 
applications  (see  Figure  1).  For  Fulcrum,  this  is  like  supply-side 
economics:  lower  the  price,  and  high-volume  sales  will  Increase 
total  revenue.  We  believe  it  will  be  successful  — and  therein  lies 
a risk.  Fulcrum  has  traditionally  marketed  the  Ful/Text 
technology  solely  through  OEMs,  and  thus  had  a high  degree  of 
control  over  the  quality  of  the  applications  its  technology 
supported.  With  this  new,  direct  sale  to  corporate  developers, 
the  quality  of  end-user  applications  will  become  variable  and  will 
no  longer  be  under  Fulcrum’s  direct  control. 

The  SearchTools  product  family  delivers  a platform  for  the 
development  of  client/server  text-retrieval  applications,  e.g.,  on- 
line access  to  emergency-response  procedures  and  technical 
documentation.  The  SearchTools  family  includes:  SearchTools, 
a SQL-based  developer’s  toolkit;  SearchServer,  a run-time 
indexing  and  retrieval  engine;  and  SearchSQL,  a query  language 
based  on  an  evolving  standard  — Structured  Full  Query 
Language  (SFQL)  — for  extending  SQL  to  text  retrieval. 

SearchTools  are  based  on  a familiar  relational  database 
management  system  (ROMS)  paradigm  and  allow  developers  to 
access  unstructured  text  data  with  the  same  access  method  — 
ISO/ANSI  Structured  Query  Language  (SQL)  — used  for 
structured  data.  The  difference  is  that  while  the  text  data  is 
organized  Into  table-like  structures  for  indexing,  documents  are 
also  stored  in  their  native,  revisable  format  (see  Figure  1).  This 

GartnerGroup 
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Rgure  2 

The  Leverage  of  SearchTools 

$50,000 


1/1 0th  the  Time  to 
Develop 


Source;  Gartner  Group 


Glossary 

API  Application  programming 

interface 

BLOB  Binary  large  object 

DM  Document  management 

OEM  OnginaJ  equipment  manufacturer 
SDK  Software  developer's  kit 

VAR  Value-added  reseller 


is  unlike  the  approaches  taken  by  other  text-retrieval  vendors, 
which  often  either  store  text  BLOBs  in  RDBMS  databases 
without  SQL  access,  or  store  ASCII  text  in  proprietary  search 
and  retrieval  engines.  Fulcrum’s  approach  has  two  key 
strengths:  1)  users  can  build  more  sophisticated  searches  with 
the  SQL  access  method  than  with  Boolean  operators;  2) 
revisable-document  retrieval  is  faster,  because  the  SearchServer 
stores  the  documents  in  their  native  format. 

Developers  familiar  with  Fulcrum’s  earlier  SDK  will  remember 
both  the  flexibility  and  complexity  of  the  250  API  calls  available 
for  application  development.  SearchTools  will  leverage  those 
developers  with  a dazzling  breakthrough  of  price  and 
productivity:  The  SearchServer  API  has  been  reduced  by  an 
order  of  magnitude  to  25  “C”  language  routines  for  calling  text- 
indexing, and.  search  services,  thereby  reducing  development 
time;  and  the  cost  of  the  SearchTools  SDK  is  now  one-tenth  its 
former  price  (see  Figure  2).  An  additional  attraction  of  the 
SearchServer  API  is  that  it  is  compliant  with  the  SQL  Access 
Group’s  Call-Level  Interface  (CL!)  standard  for  connectivity,  data 
retrieval  and  error  processing. 

In  a significant  expansion  of  its  distribution  channels.  Fulcrum  is 
building  a direct-sales  force  for  marketing  directly  into  corporate 
IS  departmerts.  Fulcrum  has  traditionally  marketed  its  products 
indirectly  through  QEM  and  VAR  channels.  This  strategy 
presumes  two  distinct  streams  of  consumer  demand  for  text 
search-and-retrieval  technology:  1)  text  search-and-retrieval 
services  within  a line-of-business  application  (OEMs/VARs);  and 
2)  text  search-and-retrieval  as  a line-of-business  application 
(direct  sales).  Fulcrum’s  shift  away  from  its  long-time  strategy 
comes  at  an  interesting  time,  as  most  text-retrieval  vendors  are 
scrambling  to  emulate  Fulcrum’s  historical  strategy  through 
mergers  and  partnerships.  The  primary  risk  to  Fulcrum’s 
reputation  is  the  quality  of  end-user  applications  developed 
without  the  quality  control  and  support  typically  extended  to  OEM 
and  VAR  applications.  Fulcrum  will  be  challenged  to  support  a 
greatly  expanded  base  of  applications. 

Fulcrum  believes  that  there  is  a stand-alone  market  for  mission- 
critical,  text-retrieval  applications,  based  on  the  80  percent  of 
corporate  data  lat  exists  in  unstructured,  text  form.  Gartner 
Group  is  ske;  tI  of  the  size  of  that  market.  We  believe 
SearchTools  is  a ^chnology  magnet  that  will  attract  other  DM 
middleware  services  for  mixed-object-type  applications  * 1 Q94 
(0.7  probability)  (see  Research  Note  SPA-MAG-1106,  3/8/93). 
Developers  will  be  drawn  to  the  low  price,  highly  functional  APIs 
and  familiar  SQL  paradigm. 


GartnerGroup  RAS  Services 

OIS:  P-FUL-11U 
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■ Software 

Windows  Personal  Librarian  Turns 
Data  into  Information 


FAST  SEARCHES:  Windows  Personal 
Librarian  has  powerful  full-text  database 
features  and  sophisticated  searches. 


BY  JAMES  KARNEY 

How  do  you  tind  the  int'orma- 
tion  you  need  efficiently  when 
the  data  you  have  access  to  on 
hard  disks  and  networks 
climbs  up  into  the  gigabyte 
range?  Windows  Personal  Li- 
brarian (WPL)  is  like  having  a 
trained  research  librarian  on 
call  to  manage  the  task.  This 
$995  Windows-based  program 
from  Personal  Library  Soft- 
ware can  give  you  capabilities 
normally  found  on  dedicated 
full-text  image  retrieval  sys- 
tems, without  programming  or 
file  conversions. 

WPL’s  search  engine  was 
originally  designed  for  large- 
scale  and  CD-ROM  informa- 
tion management  applica- 
tions. The  database  can 
include  text,  graphics,  hyper- 
text, and  linked  sound  and 
animation.  It  can  be  small, 
with  a few  word  processing 
files,  or  huge— up  to  4GB  with 
16  million  records.  Multiple 
databases  can  be  searched  at 
the  same  time,  which  makes 
the  size  of  your  storage  system 
the  limiting  factor. 

The  real  power  of  this 
program  lies  in  its  sophis- 
ticated searches.  You  en- 
ter a query  using  words  or 
complete  sentences  like 
"Look  up  the  files  on  the 
Iran-Iraq  War."  You  can 
also  use  phrases,  logical 
operators,  and  wildcards 
in  your  query.  If  you  are 
unsure  of  spelling,  the 
Fuzzy  option  will  look  for 
similar  words. 

We  queried  with  the 
string  "Iran-Iraq  War"  on 
a 3MB  database  contain- 
ing 1,045  articles  from 
newspapers  of  the  late 


1980s.  The  WPL 
main  window 
displayed  the 
document  that 
the  program 
considered  the 
most  important 
out  of  the  135 
located.  Scroll- 
ing through  the 
rest  of  the  selec- 
tions was  just  a 
matter  of  using 
the  plus  and  mi- 
nus keys. 


AUTOMATIC  WORD  LISTS 

The  Expand  and  Concept  op- 
tions automatically  generate 
lists  of  words  for  further  in- 
vestigation. The  list  we  got  for 
the  Iran-Iraq  query  included 
North,  Poindexter,  Contra, 
and  Sandinista,  as  well  as 
Saddam,  the  names  of  Iranian 
leaders,  and  the  reporters  who 
wrote  the  articles.  Any  of 
these  could  be  added  to  the 
query  by  clicking  on  them 
with  the  mouse.  Other  avail- 
able windows  list  all  matching 
files,  show  a bar  chart  indicat- 
ing how  focused  the  query  is, 


and  display  a dictionary  con- 
taining all  the  words  in  the  da- 
tabase (except  stop  words  like 
an,  and,  the,  and  so  forth). 

The  search  engine  uses  a 
number  of  statistical  models 
to  locate  relevant  informa- 
tion. The  recovery  quality  gets 
better  as  the  amount  of  data 
being  examined  increases.  In- 
stead of  just  placing  files  in  the 
list  as  they  are  found.  WPL 
ranks  the  documents  with  the 
most  interesting  ones  first. 
The  quality  of  these  recovery 
techniques  is  excellent. 

The  time  it  takes  to  com- 
plete a search  depends 
on  the  total  size  of  your 
files.  We  used  the  ar- 
chives of  London’s  The 
Independent  (which  are 
already  indexed  for  Per- 
sonal Librarian  on  a CD- 
ROM  containing  455MB 
of  data)  during  informal 
testing  on  a 33-MHz  486 
system  with  Microsoft 
Word  for  Windows  run- 
ning at  the  same  time. 
Complex  six-  and  seven- 
word  Boolean  queries 
took  4 to  6 seconds.  With 
a 2MB  database  contain- 
ing hypertext  and  graphic 


F A C T -F  I L E 


Windows  Personal 
Librarian,  Version  3.0 

Personal  Library  Software,  2400  Research 
Blv(L,  *350,  Rockville,  MD  20850; 
301-990-1155 


List  Price:  $995. 

Requires:  2MB  RAM  |4MB  recommended),  4MB 
hard  disk  space,  Microsoft  Windows  3.0  or  later. 

In  Short:  A very  powerful  retrieval  tool  that  turns 
large  amounts  of  electronic  data  into  easily  searched 
information.  Its  easy-to-use  interface  lets  you  per- 
form sophisticated  searches  immediately. 
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links  located  on  a hard  disk, 
the  same  type  of  query  took 
less  than  2 seconds.  Simple 
three-  and  four-word  search 
strings  executed  in  3 seconds 
and  less  than  1 second,  respec- 
tively, for  the  two  databases. 

SETTING  UP  TO  SEARCH 

There  are  two  levels  of  sophis- 
tication when  setting  up  WPL. 
The  basic  level  lets  novice  us- 
ers set  up  and  query  databases 
using  the  program’s  defaults. 
Advanced  features  require 
more  work,  but  using  them  is 
less  complicated  than  design- 
ing spreadsheets. 

Creating  a new  database  is 
handled  via  the  PL-Admin 
utility.  Just  select  any  combi- 
nation of  ASCII,  Microsoft 
Word,  or  WordPerfect  files 
you  want  to  include.  The  files 
require  no  special  modifica- 
tion and  can  reside  on  any 
storage  device  available  to 
your  computer,  either  locally 
or  on  a network. 

Once  the  files  are  selected, 
an  index  file  is  automatically 
generated,  which  can  take  a 
while  if  you  have  a lot  of  files. 
Indexing  can  run  in  the  back- 
ground, and  the  final  file  will 
be  50  to  60  percent  of  the  size 
of  the  source  files.  More  files 
can  be  added  to  the  database 
without  building  a new  index. 

Personal  Library  software 
is  also  available  in  DOS, 
Macintosh,  Unix,  and  VMS 
versions.  All  can  use  shared 
databases.  There  is  also  an 
OCR  Connection  utility  sold 
separately  that  can  automati- 
cally add  documents  scanned 
using  Calera  or  Kurzwiel  opti- 
cal recognition  software  to  a 
target  database. 

Windows  Personal  Librar- 
ian lets  you  find  and  organize 
information  you  may  not  have 
realized  you  needed  to  see 
when  you  asked  your  ques- 
tion. Its  ease  of  use  and  pow- 
erful search  features  justify  its 
relatively  high  price  tag.  □ 
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□ ONLINE  DATABASES  □ 

BY  CAROL  TENOPm 


The  New  Generation  of 
Online  Search  Software 


MOST  OF  TODAY’S  database 
searchers  spend  a lot  of  time  learning 
the  commaiids  of  a variety  of  systems, 
how  to  formulate  queries,  and  the  cor- 
rect use  of  Boolean  operators.  Even 
with  end  user  systems — whether  on- 
line, CD-ROM,  or  locally  loaded  data- 
bases— reference  librarians  report  an 
increased  need  fOT  bMograi^c  in- 
struction. Why  is  something  that 
makes  research  so  much  faster  so  com- 
idicaied? 

Part  of  the  problem  is  that  most  of 
today’s  online  systems  and  many  CD- 
ROM  systems  operate  with  essentially 
the  same  softw^  developed  for  the 
first  cmline  systems  20  years  ago.  Al- 
though improved  and  sometimes  re- 
written in  more  modem  programming 
languages,  the  majOT  systems  still  re- 
flect first-generation  search  tech- 
niques. They  rely  on  exaa  match  Bool- 
ean logic,  structured  commands  or 
menu  choices,  and  ccmvoluted  input 
syntax,  features  that  may  be  advanta- 
^us  to  experienced  searchers,  allow- 
ing them  to  control  the  search  inocess, 
but  unsatisfactory  fOT  end  user  systems. 

Software  ifflpr^inents 

“Innovations  in  Text  Retrieval 
Software”  (Online  £>atabases,  LJ,  June 
1, 1992,  p.  94, 96)  discussed  the  many 
improvements  in  text  retrieval  soft- 
ware that  evolved  from  the  informa- 
tion retrieval  research  laboratories  into 
the  software  market  for  end  users. 
More  innovative  search  techniques 
such  as  natural-language  input,  rele- 
vance or  word  frequency  ranking,  and 
automatic  thesaurus  features  are  ap- 
pearing in  commercial  products.  A lit- 
tle more  than  a year  ago,  most  innova- 
ti(His  in  retrieval  software  were  avail- 
aWe  only  for  in-house  databases — in 
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software  packages  such  as  Topic,  Per- 
sonal Librarian,  and  ZylNDEX. 

The  new  generation  now  has 
spread  to  the  commercial  online  and 
CD-ROM  environment  Thus  far,  the 
Perscmal  Librarian  (PL)  software  and 
Westlaw  Is  Natural. (WIN)  are  two  of 
the  most  successful  representatives  of 
the  online  second  generation. 

PL  is  sdll  available  as  a stand- 
alcme  program,  but  it  is  now  used  as  the 
search  software  for  many  CD-ROM 
products  and  as  a search  engine  for 
several  online  hosts.  WIN  provides  an 
ahemative  method  for  searching  West- 
law’s  many  online  legal  databases.  This 
month,  I will  inofile  Personal  Librari- 
an; next  month  I will  focus  on  WIN. 

Persontl  Librarun 

PL,  known  (viginally  as  SIRE,  was 
developed  over  a decade  ago  by  Mat- 
thew KoU  and  his  colleagues  at  Syra- 
cuse University  as  an  experimental  re- 
trieval system.  It  was  first  offered  in 
1 983  as  a commercial  software  product 
for  creation  of  microoon^)uter-based 
in-house  databases.  (Persc^  Library 
Software,  Inc.,  2400  Research  Blvd., 
Rockville,  MD  20850;  301-990-1 155) 

PL  was  the  first  commercial  ap- 
I^catirai  to  offer  a number  of  sear^ 
features.  Natural-language  input  is 
perhaps  the  most  obvious  to  users.  In- 
stead of  entering  a search  statement  in 
the  COTiect  and  stilted  Boolean  (^>era- 
tOT  fimn  as  with  other  systems,  users 
can  input  any  sentence  that  describes 
their  i^OTmation  need. 

Thus  the  statement  “I  need  inlw- 
mation  about  the  effect  of  last  year’s 
hurricanes  in  Florida  and  Hawaii  on 
tourism”  will  work  as  an  input  string. 
PL  does  not  use  any  artificial  intelli- 
gence or  other  techniques  for  inter- 
preting the  meaning  of  the  statement, 
nor  does  it  match  words  to  a thesau- 
rus. Tn^ftari  it  just  eliminates  stop 
words  from  the  stri^  then  ORs  to- 
gether all  the  remaining  words.  A con- 
cise statement  loaded  with  meaningful 
words  will  thus  work  best. 

At  first  glance,  a Boolean  OR  be- 
tween every  word  may  seem  like  sure 


disaster  in  full-text  databases.  It  will 
retrieve  many  documents,  but  it  works 
because  the  PL  software  uses  rele- 
vance ranking.  Retrieved  documents 
are  ranked  in  OTder  of  likely  relevance; 
users  can  browse  throu^  documents 
until  the  relevance  diminishes  or  their 
informatiem  need  is  met 

PL’s  relevance  ranking  works  by 
a complex  formula  that  tate  into  ac- 
count how  many  of  the  weeds  occur  in 
each  document  how  many  times  each 
word  occurs,  each  docuDOEDf  s length, 
and  how  often  each  term  occurs  in  the 
entire  database  compa^  with  how 
many  times  it  occurs  in  eadi  docu- 
ment Tests  show  that  although  the 
formula  is  not  perfect,  it  does  predict 
likely  relevance  mudi  of  the  time. 

Another  notaUe  search  feature  is 
the  ability  to  use  a idevant  document 
as  a query.  When  a “good”  one  is 
found,  users  can  request  similar  docu- 
ments. PL  examines  word  occurrence 
in  the  relevant  document  to  seardi  for 
documents  with  similar  word  fiequen- 
cies. 

Although  PL  has  no  standard  the- 
saurus features,  it  wiU  locate  words 
that  occur  frequently  with  another 
word.  A user  can  expand  any  search 
term  to  locate  relat^  teims  in  the 
database  and  then  search  the  addition- 
al terms  automatically. 

PL  is  still  known  as  a package  for 
creating  in-house  databases,  with 
DOS,  Windows,  Mac,  UNIX,  and 
VMS  versims.  Stand-alone  and 
networked  versiens  make  it  popular 
with  both  small  conyanies  and  large 
cOTporations,  including  Apple  Coi^ 
puters  and  Unis)^.  In  the  last  year  or 
two,  the  larger  information  i^ustry 
has  taken  note  of  its  strengths  as  well. 

Q>-ROM  products 

You  may  not  even  realize  you  are 
using  PL  when  you  purchase  a CD- 
ROM  product  from  the  U.S.  Cjovem- 
rrent  Printing  Office  or  from  a com- 
pany such  as  Grolier.  That’s  b^xuise  it 
has  been  adapted  as  the  search  engine 
by  a variety  of  CD-ROM  developers, 
many  of  whom  put  their  own  interface 
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onto  the  powerful  PL  search  engine. 
According  to  Richard  Black.  VP  of 
Business  Development  at  Personal  Li- 
brary Software,  “several  GD-ROM 
developers  our  search  engine  bur- 
ied deep  in  - bowels  of  their  prod- 
ucts with  ,^wir  own  custom  mter- 
faces.”  The  licensing  agreement, 
however,  does  require  a copyright 
statement  on  the  disc  and  on  the  pack- 
aging. 

PL  is  now  the  search  engme  for 
several  popular  CD-ROM  products. 
.Among  the  notable  titles  are  the  U.S. 
Code  from  the  U.S.  House  of  Repre- 
I sentatives,  the  Guinness  Multimedia 
Disc  of  Records  from  Grolier,  the 
American  Memory  Projea  from  the 
Library  of  Congress,  the  Laws  of 
Washington  and  Oregon  from  CD- 
Law,  a series  of  British  newspapers 
from  the  Financial  Tiroes  (including 
the  Economist,  the  Financial  Times, 
the  Daily  Telegraph,  the  Independent, 
and  olh^),  and  McCarthy’s  compila- 
tion of  full-text  business  articles  from 
European  business  new^apers  and 
magazines.  Both  text-only  and  multi- 
media  CDs  use  PL. 

Online  expansion 

PL  has  expanded  into  the  online 
arena  as  well.  Cine  of  the  first  online 
services  to  use  PL  as  its  search  engine 
is  Washingum  Alert,  the  online  system 
from  Congressional  Quarterly.  Wash- 
ington Alert  developed  its  own  search 
interface  to  work  with  the  PL  search 
engine. 

Washington  Alert  debuted  with 
the  PL  search  engine  almost  five  years 
ago.  The  service  includes  approxi- 
mately 20  databases  that  together  pro- 
vide comprehensive  coverage  of  Con- 
gress and  Congressional  actions.  No- 
table databases  include  the  full  text  of 
the  Congressional  Quarterly  Weekly 
Report  and  other  newsletters:  full  texts 
of  all  bills  and  all  committee  reports: 
bill-tracking;  and  information  about 
roll  call  floor  votes,  members  of 
Congress,  committee  actions,  and 
schedules. 

America  Online,  the  popular  end 
user  online  service,  announced  in  May 
1993  that  PL  would  be  used  as  the 
I search  engine  for  an  improved  version 
I of  its  online  system.  Again.  America 
Online  uses  its  own  interface,  but  the 
search  powers  that  make  retrieval 
work  are  now  from  PL. 

Amenca  Online  is  phasing  in  PL 
on  its  system,  database  by  database. 

! The  first  database  searchable  with  PL 
is  Amenca  Online’s  member  directo- 
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ry.  This  database  gets  more  use  than 
you  may  expect,  smce  sending  and  re- 
ceiving messages  is  Amenca  Onhne’s 
most  popular  service.  Amenca  Online 
■*^ill  contmue  to  phase  m the  conver- 
sion of  Its  full-text  databases.  It  now 
offers  news,  sports,  weather,  stock 
market,  and  software  databases,  in  ad- 
dition to  Its  popular  E-mail  a ^n- 
ferencing  funoions. 

■Amenca  Online  also  ser.  a 
gateway  to  many  databases  on  er 
online  systems  m addition  to  being  a 
resident  host.  The  gateway  databases 
will  contmue  to  use  the  search  engmes 
of  the  systems  that  house  them,  but  with 
the  America  Online  mterface.  This  may 
be  confusing  to  users,  since  the  connec- 
tions are  made  transparently. 

DataTimes  announced  in  June 
1993  that  PL  would  replace  BASIS  as 
its  newspaper  archives  software  for 
minicomputer  systems.  (It  did  not  an- 
nounce it  will  be  using  PL  for  its  com- 
mercial (mline  system.)  DataTunes 
has  two  businesses — as  a vendor  of 
the  online  system  and  as  a provider  of 
inter.  .al  hbrary  systems  to  newspa- 
pers. DataTimes  had  been  selling  mi- 
cro versions  of  PL  for  Macintosh  and 
Windows  for  its  newspaper  hbrary 
systems  chents.  Starting  in  June,  it  will 
also  offer  the  minicomputer  version  of 
PL  to  replace  BASIS. 

Dow  Jones  News/Retrieval 
(DJNR)  may  be  the  best-known  on- 
line system  to  convert  to  PL  This  past 
spring,  rumors  began  to  surface  that 
DJNR  would  switch  from  IBM  eqmp- 
ment  to  DEC  computers  and  from 
IBM  STAIRS  to  PL  Although  the  “of- 
ficial” announcement  was  delayed, 
PLS  and  Dow  Jones  have  not  deiued 
the  rumors. 

Informed  sources  say  PL  has 
been  selected  as  DJNR’s  next  genera- 
tion search  engine,  with  an  interface  to 
be  developed  by  Dow  Jones.  Conver- 
sion has  not  yet  begun,  but  expect 
something  in  1 994. 

DIALOG  buys  into  PLS 

DIALOG  Information  Services, 
Inc.  is  involved  with  Personal  Library 
Software  v i slightly  different  vein. 
DIALOG  Dunced  m July  a signifi- 
cant rmno  mvestment  m Personal 

Library  Sottware.  It  does  not  plan  to 
replace  the  DIALOG  software  with 
the  PLS  system,  but  DIALOG  will  be 
represented  on  the  PLS  Board  of  Di- 
reaors  and  the  two  will  jomtly  devel- 
op new  products. 

One  of  the  first  apphcations  areas 
is  likely  to  be  with  DIALOG’S  CD- 
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ROM  products.  PLS  has  been  success-  > 
ful  with  multimedia  CD-ROMs,  anB 
area  in  which  DIALOG  has  dc®e  little^ 
to  date.  We  can  expect  some  multime-  i 


dia  and  image/document  manage-^ 
ment  systems  from  DIALOG  withB 


Personal  Librarv  Software, 

DIALOG  and  PLS  is  a marriage  I 
of  maricet  presc with  forward-loo  m 
mg  techmcal  e '■  According 
Patnck  Tiemev.  presiaent  and  CEO 
of  DIALOG,  “PLS  has  excellent  tech-  i 
nology,  strong  performance,  andH 
knows  where  the  infixmation  businesspl 
is  heading.  Together,  our  compames  j 
will  develop  synergistic  ixoducts  that  j 
will  integrate  text  and  image-based  in-ij 
formation  from  internal  and  external  P 
databases  and  dehver  mission-critical 
data  directly  to  users’  desktops,” 
DIALOG  denies  plans  to  aban 
don  its  current  CD-ROM  search  soft- 
ware, a program  that  has  gotten  cot- 
sistently  positive  reviews.  How  the^ 
current  system  and  PL  will  interactfl 
and  whetter  they  wiD  be  used  fix  sepa-^ 
rate  databases  or  different  t>pes  of  ap- 
plications is  still  unclear.  Perhaps- . 
more  than  anything,  this  s»nnounce-B 

m^nt  chn\i/c  that  .">rarian^ 


■ I 

i 


ment  shows  that  Personal  oranan 
has  arrived  as  a search  engine  and  I 
cximpany  to  be  taken  seriously  in  tLc|| 
larger  information  industry  roarket-fl 
pdace 


WTiy  DOW? 

Why  is  PLS  attracting  so  much 
attention  in  the  infonnaiion  industry 
now  after  a decade  of  existaice?  PLS 
VP  Black  speculates  that  “thin^  aie| 
changing  very  quickly  in  the  entire  in 


formation  industry.  Informatio 
vending  and  infor  ition  retrieval  ai 


!ll 

i-P 


noving  out  of  the  h ids  of  profession-1 

P 


al  searchers  and  into  the  puUic  do- 
main. That’s  the  impetus.”  The  time  is 
finally  right  for  innovaticms  that  make^ 
software  easier  to  use  and  go  beyondl 
the  techniques  we’ve  been  using  since^ 
the  early  days. 


OnUne/CD-ROM  ’93 

I will  moderate  a sessicm  at  the 
Online/CD-ROM  ’93  meeting  in 
Washington,  D.C.,  November  1,  that| 
will  debate  the  relative  effectiveness  o: 
“old-fashioned”  command-based 
Boolean  logic  systems  with  natural 
language  and  relevance  ranking  sys- 
tems. Speakers  will  represent  both 
sides  of  the  issue. 
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Next  Month:  More  cm  the  new  genera-||| 


tion  of  online  systems  with  a look  at| 
Westlaw  Is  Natural  (WIN). 
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Gentlemen,  Start  Your  Engines!  Online  Deals  Brace 
for  Tomorrow’s  Information  Technology 


by  Mick  O’Leary 

Throughout  the  online  information 
age,  leading  companies  have  worked 
with  proud  independence,  relying  upon 
themselves  for  technological  innova- 
tion. And  why  should  they  look  else- 
where? Didn’t  the  DIALCXjs  and  Meads 
and  Dow  Jones  News/Retrievals  and 
OCLCs  virtually  invent  conuneicial  on- 
line information  retrieval? 

But  the  age  of  isolationism  is  over. 
Within  a few  months  we  have  seen  a 
flurry  of  partnerships,  acquisitknis,  and 
deals  among  diese  one-time  Lone  Rang- 
ers. DIALOG  and  Sun,  Mead  and  Folio, 
DIALOG  and  Personal  Library  Soft- 
ware, Dow  Jones  News/Retrieval  and 
Personal  Library  Software,  DIALOG 
and  Advanced  Research  Technology 
and  Telebase,  OCLC  and  Information 
Dimensions:  the  long-time  online  indus- 
try isolationist  status  quo  is  gone. 

Are  these  deals  signs  of  weakness — 
admissions  that  the  online  leaders  just 
can’t  cut  it  alone  any  more?  Actually, 
it’s  the  other  way  around.  The  deals 
demonstrate  foresight  and  an  ability  to 
prepare  for  the  online  environment  of 
the  future,  where  no  one  knows  it  all  and 
the  smart  players  are  the  Turns  that  make 
the  shrewdest  alliances. 

"'This  Isn't  Kansas,  Anymore...'' 

The  online  technology  that  most  of  us 
have  known  for  as  long  as  twenty  years 
is  gone  ...  or  to  put  it  more  charitably, 
is  giving  way  to  a new,  more  complex, 
and  more  powerful  paradigm.  The  com- 
fortable old  model  has  several  basic 
technical  principles:  mainframe  comput- 
ers linked  to  terminals  or  microcomput- 
ers; ASCII  data  format;  Boolean  search- 
ing; and  character-based  interfaces.  The 
new  paradigm  will  have  open  system  ar- 
chitecture with  distributed  computing; 
formats  that  allow  graphic  transmission 
and  coimectivity  between  host  and  di- 

ir 


verse  local  environments;  relevance 
search  engines;  and  graphical  interfaces. 

This  is  not  just  technobabble.  The 
new  paradigm  will  offer  faster,  more 
powerful  retrieval  of  richer  data  by 
searchers  and  end  users  alike,  along  with 
simpler,  organization-wide  connectiv- 
ity. It  will  providfe  the  growing  legions 
of  end  users  with  genuine  search  power 
and  allow  searchers  to  step  into  more 
managerial  and  consultative  roles.  Put- 
ting all  of  these  elements  together,  how- 
ever, is  far  beyond  the  expertise  of  any 
single  company.  Hence  the  deals. 


DIALOG  Doas  Deals 

No  one  has  made  more  big  deals 
lately  than  DIALOG.  Perhaps  the  big- 
gest is  a long-term  alliance  with  Sun 
Microsystems  Computer  Corporation,  to 
transfer  DIALOG’S  mainframe  opera- 
tions to  Sun’s  client/server,  open  system 
architecture.  This  will  provide  several 
internal  advantages,  including  faster, 
cheaper  operation  and  easier  upgrades. 
Searchers,  according  to  DIALOG  Vice- 
President  for  Systems  Development 
Gordon  Schick,  will  benefit  mainly 
through  connectivity  and  graphics.  The 
new  systems  will  permit  compatibility 
with  local  envirorunents  and  distribution 
of  varied  data  formats. 

Schick  explains  that  the  DIALOG/ 


Sun  collaboration  will  unfold  over  sev- 
eral years,  but  that  DIALOG  users  will 
see  two  substantial  benefits  as  early  as 
the  end  of  this  year.  One  will  be  deliv- 
ery of  search  results  by  fax,  including 
page  graphics.  The  second  is  compat- 
ibility with  many  electronic  mail  sys- 
tems through  an  X.400  gateway.  Such  e- 
mail  connectivity  will  allow 
downloaded  information  to  be 
seamlessly  distributed  throughout  a con- 
sumer organization’s  local  environmenL 
DIALOG  has  been  active  in  reform- 
ing its  search  technolo^  as  well.  Alli- 
ances with  Advanced  Research  Tech- 
nologies (ART)  and  Telebase  authorize 
these  gateway  services  to  devise  special- 
ized interfaces  for  individual  clients. 
DIALOG  of  course  has  a series  of  end- 
user  interfaces,  including  the  Cotmec- 
tions  and  DIALOG  Menus,  but  these 
still  aren’t  enough.  ART  President  Dan 
Meyer  explains  that  many  clients  re- 
quire a much  higher  degree  of 
customization,  which  puts  every  aspect 
of  the  search  process  under  automatic  or 
menu-driven  control. 

ART  is  only  two  years  old.  but  its  in- 
terface-developing  roots  are  deep. 
Meyer  and  several  other  ART  princip)als 
were  part  of  the  original  Telebase  team 
that  created  EasyNet.  The  firm  has  done 
front-ends  for  GE  Information  Services, 
GEnie,  the  Bell  Atlantic  IntelliGate 
Business  Service,  and  many  private  cli- 
ents. ART’S  interfaces,  Meyer  explains, 
customize  database  selection,  specific 
search  query  formulation,  and  output. 
ART  will  even  devise  a “pricing”  inter- 
face, which  will  translate  conventional 
DIALOG  pricing  into  subscription  or 
pay-per-search  for  the  client 

Beyond  Booieon 

This  summer  DIALOG  also  pur- 
chased a minority  share  in  Personal  Li- 

( continued  on  page  28) 
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brary  Software  (PLS),  whose  flagship 
product  is  Personal  Librarian  (PL),  its 
relevance  search  engine.  PLS  has  been 
all  over  the  online  news  lately.  Besides 
(Jic  DIALOG  deal,  PLS  is  supplying  its 
software  to  Dow  Jones  Ncws/Rctricval, 
DataTimes,  and  America  Online.  This 
sudden  fame,  and  the  accompanying 
fuss  over  relevance  searching,  amuses 
PLS  president  z^i  founder  Matt  Koll: 

‘‘A  lot  of  wh  t we  do  is  not  new.  In 
1 983  we  were  shipping  a reuneval  system 
with  natural  language  queries,  relevance 
ranking,  automatic  word  associations, 
concept  searching,  and  search  by  ex- 
ample. That's  based  on  stuff  that  was  in 
the  literature  for  20  years  at  that  point.” 

However,  the  time  has  arrived  for 
PLS  and  relevance  engines.  Boolean  re- 
trieval may  have  reached  the  end  of  its 
life  cycle,  having  been  refined  by  DIA- 
LOG and  others  to  'he  last  degree. 
Today’s  end  users,  Koll  explains,  will 
prefer  relevance  to  Boolean  searching: 

"As  we  start  getting  into  broader 
audience,  most  of  the  casual  us  .s  do  not 
want  to  learn  Boolean  operations.  Natu- 
ral language,  relevance  feedback  search- 
ing will  be  a lot  more  attractive  to  them. 
In  many  ways  it’s  more  effective.  There 
arc  lots  of  circumstances  where  just  be- 
ing able  to  throw  a few  ords  and  then 
interact,  is  the  most  powerful  and  effi- 
cient way  to  find  what  you  want." 

PLS’  first  commercial  online  appli- 
cation was  with  Congressional 
Quarterly’s  Washington  Alert  Service, 
back  in  1990. 

DIALOG’S  ownership  share  in  PLS 
has  been  estimated  at  ten  to  twenty  per- 
cent. Ncvenhcless,  DIALOG’S  Gordon 
Schick  emphasizes  that  DIALOG  does 
not  yet  have  an  implementation  plan  for 
PL  and  may  even  use  other  relevance 
engines  as  well.  DIALOG  is  also  evalu- 
ating the  WIN  search  engine  developed 
by  Westlaw,  its  alliance  partner. 

Regardless  of  what  DIALOG  does 
wiili  PL,  relevance  searching  is  coming 
to  DIALOG,  as  Marketing  Manager 
Libby  Trudell  acknowledges:  "Non- 
Boolean  search  engines  arc  going  to 
have  to  be  available.  We  definitely  see 
that  in  our  future." 


The  proven  PLS  engine  could  very 
well  become  the  system  of  choice 
throughout  DIALOG.  Although  this 
search  technology  has  been  associated 
with  full-text  databases,  Koll  explains 
that  ' works  well  with  abstract  and  di- 
r files  as  well: 

c algcnthms  v v well  with 
si  ,ize  enubes  anu  v,  Jr  small  tr 
dium  records,  as  long  as  there  is  ...  .e 
bit  of  information  to  give  the  relevance 
ranking  something  to  work  with.” 

Dow  Jones  News/Retrieval  and 
DataTimes  Kick  Tires 

PLS  will  have  a lot  to  work  with  in 
the  big  full-text  databases  on  Dow  Jones 
Ncws/Rctricval  and  DataTimes.  News 
about  the  Dow  Jones  Ncws/Rctricval 
and  DataTirvs  deal  with  PLS  slipped 
out  before  eiL.er  company  could  make  a 
formal  announcement,  catching  both 
off-guard.  Neither  Matt  Koll  nor  Dow 
Jones  officials  have  commented  upon 
the  outcome,  yet  it  is  interesting  to 
speculate  upon  Dow  Jones  News/ 
Retrieval ’s  intentions,  espc  ally  in  view 
of  istory  in  relevance  Starching. 

i990  Dow  Jones  announced 
DC  QUEST,  making  it  and  CQ’ 
Washington  Alert  Service  the  first  com- 
mercial online  services  to  offer  natural 
languagc/relevance  searching.  DOW- 
QUEST,  a very  complex  and  expensive 
development  project,  uses  an  engine  de- 
veloped by  Thinking  Machines.  Al- 
though there  are  many  conceptual  and 
te  inicdl  differences  between  the 
Thinking  Machine  and  PLS  engines,  to 
the  casual  searcher  they  are  similar. 

The  PLS  deal  raises  intriguing  ques- 
tions. Is  PL  a replacement  for  an  unsat- 
isfactory Thinking  Machines  engine? 
What  is  the  implementation  plan  for  Per- 
son librarian  software?  How  will  it  be 
integ  ated  with  Dow  Jones  News/ 
Retrieval’s  present  interface? 

DataTimes  will  undoubtedly  share 
any  Dow  Jones  Ncws/Retricval-PLS 
implementation  plan.  The  two  services 
have  a common  technical  platform,  and 
DataTimes  itself  has  already  had  consid- 
erable experience  with  PLS.  It  employs 
Personal  Librarian  as  the  search  engine 


for  PC  DataTimes,  which  is  used  by  its 
smaller  newspaper  clients  for  internal 
text  retrieval. 

DataTimes  recently  announced  that 
Personal  Librarian  would  be  used  for  a 
larger,  client/server  version  that  would 
support  newspapers  of  all  sizes.  Accord- 
ing to  Marketing  Manager  Ed  Roach, 
this  product  will  not  be  available  to 
DataTimes’  online  service  clients,  but 
he  emphasizes  that  DataTimes  is  very 
interested  in  the  broader  application  of 
the  PLS  engine. 

The  prominence  of  PLS  in  the  plans 
of  so  many  major  online  services  has 
quickly  made  the  small  firm  an  online 
industry  heavyweight.  When  asked  if 
deals  with  competitors  aren’t  conflicts 
of  interest,  PLS’  Matt  Koll  replies: 

"We  think  that  all  of  f c people  we 
are  woricing  with  will  remain  happy  that 
we  have  the  integrity  to  do  what  is  in  the  ■ 
interest  of  our  partner." 

Serving  two— or  more — masters  is  of  , 
course  common  in  the  online  world;  da-  , 
tabase  producers  have  been  doing  it 
since  the  beginning.  PLS’  wide  accep-  : 
tance  also  raises  the  question  of  whether  ’ 
it  is  emerging  as  a de  facto  standard  for 
natural  language/relevat  :e  retrieval. 
PLS  could  be  on  its  way  to  being  the  |. 
Microsoft  of  online  searching. 
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Summary  and  Conclusions 

Verity  has  a high-end  information  retrieval  product  that  is  positioned  in  a highly 
fragmented  market.  The  growth  of  online  services  and  eiectronic  messaging 
offers  new  opportunities  for  vendors  of  this  type  of  software.  Verity  needs  to 
capitalize  on  its  good  technical  reputation,  continue  packaging  the  software  into 
modules  and  market  aggressively  to  attract  high  quality  distribution  partners. 
This  is  a risky  investment  that  requires  world  class  marketing  to  make  it 
succeed. 

1.  The  company  has  many  challenges  to  overcome,  both  technical  and  marketing. 
To  succeed  the  company  wlU  have  to  continue  to  re-engineer  its  products  and 
have  world  class  marketing.  The  re-engineering  is  going  well. 

2.  Philippe  Courtot  has  generally  been  well- received  by  key  accoimts.  They 
mentioned  that  he  had  made  Verity  more  customer  focused.  For  example  he  ran 
a successful  user  conference  - the  only  negative  comments  being  concerns  about 
the  company's  ability  to  deliver  modular  software. 

3.  Major  factors  that  Philippe  is  trying  to  rectify  include:  finding  a VP  of  Marketing, 
and  getting  code  development  on  schedule.  Philip  Nelson,  a founder  and  VP 
Engineering,  has  already  demonstrated  ability  to  get  projects  on  track.  Adobe 
also  mentioned  that  they  were  pleased  with  the  release  they  received  in  March 
on  schedule  to  within  a week. 

4.  For  the  company  to  succeed  will  require  major  alliances  with  developers.  Adobe 
and  Lotus  are  two  of  Verity's  partners.  Key  to  Verity's  success  wlU  be  the  ability 
to  penetrate  the  value-added  reseller  (VAR)  market. 

5.  Both  large  and  small  system  integrators  need  to  be  given  the  tools  required  to 
support  their  teams  of  programmers.  This  includes  software  modules  for 
connecting  to  other  systems  and  training  materials  for  both  sales  and 
engineering  staffs. 

6.  To  succeed  the  company  will  need  a scalable  product  line  - from  the  desktop  to 
the  enterprise.  This  may  require  the  acquisition  of  smaller  companies  over  time. 
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Introduction 


This  report  reviews  the  market  opportunities  for  Verity,  a full-text  retrieval 
software  vendor.  It  indicates  steps  that  Verity  needs  to  take  to  become  a 
successlul  company.  It  also  provides  both  customer  and  personal  rei'erences. 

Market  Analysis 

Historical  Positioninn 

When  large  full  text  retrieval  systems  were  first  launched  on  mainframe 
computers  their  maia  function  was  to  find  documents,  with  a simple  sort  by 
date  or  search  field.  Verity's  TOPIC  adds  another  dimension.  It  ranks 
documents  based  on  search  relevance. 

In  addition,  search  technology  has  been  applied  to  publishing  systems  for  two 
reasons.  First  document  publishers  need  to  be  able  to  find  text  already  stored 
on  a system  to  add  to  their  publications.  Secondly,  information  searchers  need 
to  publish  and  present  document  search  results  professionally.  Frame  has 
integrated  Verity's  TOPIC  as  a search  engine  for  its  FrameMaker  software. 
Interleaf  is  another  publishing  software  company  that  has  chosen  to  market  fuH- 
text  retrieval  software,  WorldView,  as  part  of  its  product  line.  WorldView 
provides  full-text  retrieval,  based  on  Fulcrum’s  Ful/Text,  for  electronic 
document  distribution.  A key  development  in  the  integration  of  publishing  and 
retrieval  systems  has  been  the  SGML  language.  This  language  enables  the 
structure  of  a document  to  be  represented  across  different  computers,  enabling 
it  to  be  retrieved  and  published  on  both  screens  and  printers.  For  example. 
Silicon  Graphics  uses  software  from  Passage  Systems  to  display  its  online 
technical  documentation  and  electronic  books. 

Lotus  Notes  is  a client/server  database  that  can  route  documents  from  one  desk 
to  another.  Combining  a search  engine  with  Lotus  Notes  enables  documents  to 
be  found.  Once  documents  are  found  they  can  be  merged,  processed  and 
analyzed.  Verity  has  also  released  a Worldwide  Web  (WWW)  server  for  the 
internet  that  companies  can  use  to  post  information  on  pubUc  data  networks. 
Some  companies  are  starting  to  compose  personal  newspapers  electronically 
from  full-text  searches,  an  example  being  Individual,  a provider  of  headline  news 
services  to  corporations  and  individual  subscribers.  Individual's  software, 
SMART,  analyzes  the  documents  using  retrieval  rules,  with  human  intervention 
as  needed  to  modify  the  search.  Exhibit  1 shows  the  evolution  of  functionality 
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found  in  systems  that  use  full  text  retrieval.  With  its  partners  Frame  and  Lotus, 
Verity's  TOPIC  is  a player  in  all  areas. 


Exhibit  1 


Functional  Evolution  Of  Full  Text  Software 
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Market  Conditions 

Verity  sees  a fragmented  market  with  vendors  ranging  from  established 
hardware  manufacturers  like  IBM,  through  academic  software  to  PC  shareware. 
Verity  needs  to  take  a market  leadership  role  as  the  number  of  full  text  search 
vendors  clearly  proves  that  there  is  a market.  Customers  want: 

• powerful  user  interfaces 

• scalable  modules  - ability  to  scale  from  a single  PC  to  an  enterprise 

• APIs  (application  prograrrirning  interfaces)  for  easy  integration 

• ease  of  set  up,  implementation  and  maintenance 

• accurate  searching 
• reliable  performance 
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rapid  retrieval  speed 


The  issue  is  whether  Verity  can  outmarket  its  competitors.  Verity  has 
competed  to  date  on  speed,  performance  and  searching.  Verity  also 
competes  on  the  overall  quality  of  its  software  integration  environment  and 
APIs.  User  interfaces,  modularity  eind  ease  of  system  maintenance  have  not 
been  key  strengths.  Instead  Verity  relies  on  either  embedding  its  software  in 
other  applications,  such  as  Lotus  Notes,  to  improve  the  user  interface  or  relying 
on  the  skills  of  a professional  information  scientist.  To  succeed  it  has  to 
leverage  its  strengths  through  partners  that  can  compensate  for  its  weaknesses. 


Verity  will  grow  by  focusing  on  emerging  markets,  for  which  it  can  provide  a 
modular  solution,  such  as  enterprise  electronic  document  retrieval, 
consumer  online  services  and  electronic  messaging  applications.  As  it 

gains  presence  in  these  high  growth  areas  it  can  then  displace  competitors  in 
more  established  areas  like  document  imaging,  legal  and  corporate  online 
information  services. 

Competitive  Positioning 

There  are  hundreds  of  full  text  search  software  packages,  ranging  from 
shareware  for  single  PC  users  to  massive  custom  systems  based  on  parallel 
processing  systems.  Verity  cannot  take  this  entire  market  with  its  current 
product  line,  even  though  this  may  be  desirable  long  term.  Verity  is  focusing  on 
client/server  solutions  for  the  enterprise.  The  software  is  scalable  so  that  it  can 
be  used  for  small  groups,  but  Verity  will  not  be  a good  solution  for  casual  PC 
users  who  ca  get  solutions  for  simple  file  searching  from  companies  Hke  On 
Technologies  with  On  Location. 

At  the  high-end,  database  companies  can  compete  to  some  extent  with  Verity. 
Oracle's  SQL*TextRetrieval  server  has  not  been  a success,  fundamentally 
because  the  architecture  required  to  retrieve  full  text  documents  is  not  the  same 
as  that  required  for  data  fields  and  binary  files.  Oracle's  recent  announcement 
of  ConText  suggests  that  Oracle  is  moving  further  into  the  text  retrieval  market. 
ConText  focuses  on  analyzing  streams  of  data,  rather  than  on  managing  the 
storage  of  full  text  documents.  ConText  combined  with  an  Oracle  database  may 
compete  with  Verity  in  applications  like  reading  newsfeeds  because  it  will  be  able 
to  select  items  of  interest  from  a news  collection.  However,  the  underlying 
storage  provided  by  Oracle  will  not  be  tailored  for  full  text  searching  in  the  way 
that  Verity's  database  is  optimized.  A further  indication  that  database 
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companies  are  really  not  in  the  same  business  as  Verity  is  that  Sybase  is 
working  with  Verity  for  fuU  text  technology. 

Verity's  biggest  competitor  in  the  UNIX  market  is  Ottawa-based  Fulcrum. 
Fulcrum  has  established  itself  as  the  leading  supplier  of  enabling  technology  for 
fuU  text  retrieval  in  the  UNIX  environment.  It  is  now  moving  more  towards  an 
end-user  model.  Fulcrum  claims  to  have  over  100  development  partners,  several 
thousand  instaUations  and  over  250,000  CD-ROMs  using  its  software.  Exhibit  2 
below  compares  Fulcrum  vs.  Verity. 


Exhibit  2 


Fulcrum  vs.  Verity 


Feature 

Verity 

Fulcrum::; 

Accuracy  of  query  processing 

High  quality  using  TOPICS.  The 
current  setting  up  of  TOPICS  is 
also  a disadvantage  as  it  takes 
time  to  set  up  the  topics.  However 
Verity  is  working  on  automating 
the  creation  of  topics  and  on 
supplying  standard  topics. 

Uses  many  methods  including 
matching  to  a sample  document, 
statistical  relevance  ranking.  Does 
not  work  as  accurately  as  Verity. 

It  may  be  better  at  retrieving 
documents  associated  with 
phrases  rather  than  individual 
words  than  Verity. 

Quality  of  APIs 

At  both  Lotus  and  Adobe  Verity 
beat  Fulcrum  because  it  had  better 
APIs  for  developers. 

SQL  support 

A key  focus  that  makes  the 
product  useful  to  those  familiar 
with  SQL  databases. 

Multi-threaded  engine 

Yes 

SGML  Support 

Acquired  34%  in  Exoterica  an 
SGML  company 

Size  of  client  software  libraries  for 
MS  windows  (DLLs) 

100  to  200K  using  VDK 

1MB 
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BASIS  from  Information  Dimensions  has  the  installed  base  in  many 
corporations.  Information  Dimensions  provides  more  user  consulting  and 
support  than  either  Verity  or  Fulcrum.  It  system  is  aging  and  Verity  has  seen 
less  competition  from  BASIS  in  the  last  few  years  than  in  the  late  80s  when  it 
started.  ExcaJibur  is  also  a competitor.  Excalibur  integrates  its  software  well 
with  scaiming  and  OCR  solutions  and  is  more  likely  to  wm  in  situtations  where 
scarming  paper  files  and  converting  them  to  text  is  important.  In  its  last  quarter 
ending  April  1994  its  quarterly  revenues  were  down  to  $2.1M  compared  with 
$2.4M  from  a year  before.  Elxcaiibur  is  much  less  Likely  to  align  itself  with  OEMs 
and  VARs  to  the  extent  that  Fulcrum  has  or  that  Verity  plans  to  do.  For  the 
internet,  WAIS  is  a public  domain  package  that  is  being  commercialized.  It  uses 
the  Z3950  protocol  for  retrieving  data  which  is  somewhat  limited.  Verity  has 
produced  a Worldwide  Web  server  for  the  internet  which  will  provide  better 
searching  functionality  than  WAIS. 

ConQuest  (410-290-7150.  Columbia  MD)  is  an  emerging  competitor.  ConQuest 
is  working  with  Motorola  for  an  online  information  service.  Folio  Views  is  also  a 
competitor  at  the  low-end  although  it  is  more  suitable  for  applications  that 
involve  pubhshing  a database,  rather  than  for  workgroup  applications. 


Market  Size 

The  market  size  is  estimated  to  be  about  $200  to  $300M  for  full  text  retrieval 
software.  It  is  growmg  at  about  30%  according  to  industry  estimates.  This 
growth  rate  is  in  line  with  INPUT'S  estimates  for  the  US  client/server  software 
market.  The  full  text  retrieval  market  is  considerably  smaller  than  the  database 
market. 

Verity  will  be  constrained  by  sales.  Each  sales  person  can  probably  sell  on 
average  $1M  to  $2M  per  year,  depending  on  distribution  channel.  Hence  50 
sales  people  could  sell  $50M  to  $100M  giving  a 25-30%  market  share.  This  in 
addition  would  require  another  50  support  people,  for  technical  support,  field 
sales  support  and  administration.  In  addition,  the  budget  for  marketing, 
promotion  and  advertising  required  to  support  and  justify  such  an  effort  would 
be  about  $15M  per  year.  With  an  average  sales  person  costing  $180K  a year 
including  travel  and  expenses  and  adrninistrative  staff  costing  $130K,  the  total 
sales  and  marketing  budget  required  to  achieve  such  penetration  would  be  close 
to  S30M.  This  is  out  of  the  question  for  a start-up  so  the  market  must  be  more 
narrowly  defined.  The  market  can  be  segmented  by: 

• operating  system  platform  - Verity  is  targeting  UNIX  and  NT  servers 
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• database  platform  - Verity  is  working  with  Sybase  and  potentially 
could  work  with  all  the  database  companies 

• enabling  software  - Verity  is  working  with  Lotus  Notes  and  Adobe 
Acrobat 

• customer  size  - typically  Verity  has  worked  with  large  customers  and 
is  moving  to  work  with  smaller  companies  through  distributors 

• software  architecture  - Verity  is  targeting  client/server  architectures 
with  UNIX  servers 

By  focusing  on  the  lucrative  enterprise  client-server  market  and  selling  through 
indirect  channels  Verity  can  accumulate  the  capital  needed  to  move  into  other 
segments.  Another  key  success  factor  will  be  for  Verity  to  ensure  that  third 
parties  can  provide  the  complete  solution  for  various  applications  and 
industries,  including  TOPIC  files. 

Market  Trends 

• Usage-based  pricing.  Several  software  companies  are  considering  usage  based 
pricing.  For  example,  Cincom  has  a pricing  scheme  where  users  pay  depending 
on  how  much  they  use  a database.  It  is  still  unclear  whether  software 
companies  will  be  compensated  for  usage,  or  whether  it  wdl  only  apply  to 
content  companies  that  supply  the  information  stored  in  databases.  In  the 
enterprise  this  is  not  likely  to  affect  Verity  for  several  years,  but  in  public 
networks  it  coiild  affect  Verity's  business  model. 

• Explosion  in  public  and  private  messaging.  Major  corporations  have  used 
electronic  mail  for  years.  As  small  businesses,  non-profits  and  individuals 
discover  the  technology  there  will  be  more  electronic  documents.  Using  Verity 
for  specific  applications  like  classifieds  at  the  San  Jose  Mercury  News  is  an 
example  of  a pubhc  onhne  service. 

• Full  text  software  built  into  the  operating  system.  On  the  one  hand 
companies  like  IBM  are  moving  to  microkernels  and  multiple  operating  systems. 
On  the  other  hand  companies  like  Microsoft  are  integrating  more  tools  into  their 
Windows  family  of  operating  systems,  including  full  text.  This  offers  an 
opportunity  for  Verity  to  supply  other  operating  system  vendors,  namely  the 
UNIX  hardware  manufacturers  with  embedded  full  text  technology.  Verity's 
modular  approach  gives  it  the  opportunity  to  supply  tools  that  work  around 
other  commodity  full  text  engines. 
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• Automated  customer  service  and  trouble  reporting.  These  are  also  document 
intensive  applications  that  are  based  on  electronic  messaging.  Customer  service 
frequently  requires  fast  response  and  accurate  answers. 


Verity's  Strategy 

To  be  successful  Verity  can  take  either  a vertically  integrated  full  solution 
approach  or  a horizontal  approach.  The  vertically  integrated  approach  requires 
Verity  to  acquire  or  partner  with  the  vendors  of  software  that  can  provide  a 
customer  with  a full  solution.  This  was  the  approach  taken  by  Verity  initially  in 
the  government  market  and  with  a few  corporate  accounts.  However  it  proved 
costly  to  support. 

The  alternative  approach  is  to  rely  on  third  parties  for  distribution,  software 
additions  and  support.  This  is  the  course  chosen  by  Verity.  It  should  leverage 
Verity's  strengths  as  a technology  supplier  and  accelerate  market  penetration. 

A key  element  of  Verity's  strategy  is  modularity.  Modules  are  required  for; 

• Casual  user  clients 

• Expert  user  clients 

• Searching  ported  to  various  servers 

• CD-ROM  authors 

• Third  party  software  developers  - like  Lotus 

• TOPICS  - customized  by  application  and  market 

• Administration 

• Agents,  routing  and  directories 

Verity  has  made  the  architectural  switch  to  client/server  modules  and  the  key 
will  be  to  interconnect  them  efficiently  and  seamlessly.  Another  key  challenge  is 
pricing.  The  agent  developers'  kit  at  $2950  is  acceptable  for  now,  but  it  needs  to 
get  to  no  more  than  $300  to  be  attractive  to  the  mass  market. 
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Operational  Priorities 

• Improve  customer  relations.  Philippe  Courtot  has  set  Verity  on  a clear 
mission,  to  improve  customer  relations  and  build  modular  products.  In 
interviewing  customers  they  ail  commented  on  how  well  they  had  been  treated 
compared  with  their  prior  experience. 

• Improve  employee  teamwork  and  morale.  Prior  to  Philippe's  arrival  Verity 
appears  to  have  had  strife  between  engineering  and  sales,  with  detrimental 
results.  Philippe  has  addressed  the  problem  with  a new  sales  force,  however  he 
stiU  has  to  staff  engineering  fuHy  and  marketing  lacks  leadership.  It  is  hard  for 
a Silicon  Valley  company  to  attract  engineers  after  the  initial  wave  of  start-up 
energy  has  died  down.  It  wtU  be  one  of  the  hardest  jobs  of  Philippe  and  his  team 
to  energize  the  organization,  however  a promising  start  has  been  made  in  this 
direction. 

• Package  the  software  in  modules.  As  software  prices  decline,  operating 
systems  include  full  text  search  engines  and  information  services  move  to 
transaction-based  pricing 

• Ship  on  time.  Phihp  Nelson,  recently  appointed  VP  Technology,  reported  that 
Verity  had  reset  its  schedules  for  Version  4.0  and  shipped  according  to  the 
revised  schedules.  Another  ship  date  of  March  28th  had  been  met  within  a 
week.  Philip  said  that  he  had  tried  to  set  realistic  schedules  and  that  working 
closely  with  HP,  Lotus  and  Adobe  had  helped  keep  the  schedules  imder  control. 

• Improve  user  interface.  Philip  reported  that  they  were  working  hard  on  this 
and  that  many  aspects  of  the  user  interface,  especially  in  the  area  of 
adroinistration,  had  already  been  improved.  Verity  sees  this  as  a key  priority. 

• Attract  high  quality  engineers.  In  the  past  few  weeks  5 engineers  have  been 
hired  and  Philip  is  confident  that  they  can  rebuild  a high  calibre  organization. 

He  emphasized  that  engineers  were  mainly  friends  of  other  engineers  so  that 
they  were  confident  that  they  could  work  together. 

• Attract  the  right  VARs  and  consultants.  Philippe  mentioned  that  this  was  key 
to  tlieir  success.  Once  they  have  marketing  team  in  place  they  can  be  expected 
to  attract  the  appropriate  partners. 

Partnering  Opportunities 

To  ramp  its  business  Verity  is  looking  to  indirect  distribution  channels  and  also 
OEM  relationships.  To  be  an  OEM  supplier  Verity  has  to  maintain  a 

Trident  Capital  and  INPUT  Confidential  Page  9 AMH  June  13,  1994 


INPUT 


technological  lead.  The  company  has  to  focus  on  distribution  channels  in  an 
orderly  fashion  as  follows: 

Corporate  users 

• corporate  users  - attract  for  good  references,  keep  current  sites 

• refocus  on  systems  integration  groups  when  ready 
OEMs 

• hardware  manufacturers  like  HP  - compete  for  platform  now 

especially  with  UNIX  vendors  such  as  Silicon  Graphics,  Sequent, 
Unisys,  IBM,  Sun 

• database  vendors  - compete  for  platform  now  especially  Sybase,  also 
try  Informix  who  once  acquired  a full  text  company 

• document  software  vendors  - publishers  like  Frame  and  Adobe, 
workflow  vendors  like  Notes,  display  software  like  No  Hands 

• networking  vendors  Uke  Novell 

• copier  companies  - Xerox,  Kodak  and  Japanese  vendors 

• when  established  increase  presence  with  overseas  OEMs 
VARs 

• continue  to  build  vertical  market  presence  through  VARs 

in  particular  try  VARs  that  cannot  afford  to  maintain  their  home 
grown  text  retrieval  code  for  applications  such  as  document 
management,  imaging,  legal,  publishing 

System  Integrators 

• when  the  code  is  well-documented,  modular,  stable  and  supported  by 
VARs  then  attract  in-house  system  integrators  in  major  corporations, 
system  integrators  Uke  SHL  Systemhouse,  Andersen,  getting 
widespread  deployment. 

• attract  federal  contractors  to  bid  with  Verity  once  the  product  is 
rolling  with  the  commercial  integrators 
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Partners  That  May  Acquire  Verity 

Adobe  - needs  more  than  PostScript  and  Acrobat  - yet  another  tool  for  their  bag 

AT&T  - for  online  information  services  (Easylink  division)  or  Global 
Information  Systems  division  - something  for  them  to  integrate. 

Computer  Associates  - to  go  with  their  numerous  databases  - they  acquire  many 
older  software  companies  that  need  a direct  sales  force. 

DEB  IS  - Daimler-Benz  Information  Systems  - owns  AEG  - world’s  largest  postal 
sorting  OCR  technology  suppher  - may  want  to  expand  document  management 
systems  integration  business  in  Europe. 

Informix  - once  acquired  a full-text  company  in  the  mid-80s  - what  happened? 

Kodak  - realize  George  Fisher's  vision  of  an  electronic  world  Fulcrum  is  a key 
Kodak  supplier  for  the  Optistar  writeable  CD  system  for  data  centers. 

Lotus  - once  acquired  BlueFish  for  text-retrieval  for  CD-ROM  publishing  in  mid- 
80s  -what  happened? 

Novell  - urdikely  - but  if  Verity  integrated  well  with  AppWare  and  NetWare  (as 
well  as  Word  Perfect)  may  make  sense  for  the  larger  enterprise 

Sybase  - unlikely  as  they  have  many  choices 

Wang  - still  a $900M  company  interested  in  documents 

Xerox  - even  though  they  have  XSoft  - they  can  acquire  competing  technologies  - 
there  are  so  many  divisions  - sell  it  in  El  Segundo  or  Rochester  to  someone  who 
does  not  like  XSoft. 


References 

Verity  Employees 

Both  Philppe  Courtot  (President),  Phil  Nelson  (VP  Engineering]  and  Sue 
Barsamian  (NA  sales)  are  highly  enthusiastic  and  energetic  regarding  the  new 
direction  of  the  company,  Philippe  clearly  articulated  his  plans,  Phil  explained 
how  he  had  motivated  engineering  to  get  projects  on  schedule  and  Sue  explained 
how  she  sold  against  the  competition. 
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End-user  Customt^r.^ 


Bernard  Morere  - 216-271-8929  BP 

Used  internally  for  hazardous  waste  and  health  and  safety  documents  for  about 
6 years.  Also  uses  with  AUTOCAD  and  Verity  Image  Viewer  for  engineering. 

Help  desk  is  another  application.  Spread  to  European  offices.  May  use  at  other 
sites  internally.  Very  satisfied. 

Alternatives  considered  were  BASIS,  Digital's  Book  Reader  and  Videotext  and 
F ulcrum.  Used  an  RFP  to  map  capabilities  with  requirements.  Still  believes  that 
Verity  is  top  of  the  line  and  would  choose  it  again.  Finds  Philippe  Courtot  very 
easy  to  deal  with  - good  rapport.  Like's  new  company  philosophy  and  wonders 
how  pricing  will  affect  the  ability  to  deploy  on  a wider  basis  and  how  shrink- 
wrapped  products  will  turn  out. 

New  applications  wiU  be  for  reacUng  news  wires  for  trading  desks  monitoring 
crude  oil  futures  and  related  news.  Weaknesses  seem  to  be  a gap  between 
demos  and  what  is  actually  shipping.  In  particular  Verity  is  tackling  installation 
and  maintenance  a key  area  of  interest.  Hopes  that  he  does  not  have  to  spend 
hours  of  consulting  time  as  in  the  past  to  get  the  platform  stable.  Hopes  that 
Verity  can  deliver  on  what  it  promised  at  its  user  conference. 


Dave  Sharp  - Legal  Dept.  - 713-374-2744  Compaq 

Uses  Verity's  TOPIC  to  anticipate  lawsuits  - does  the  work  of  2600  filing  clerks 
going  through  over  1.5M  documents  in  8 minutes.  They  can  OCR  40,000  pages 
a day.  Has  licensed  technology  to  other  customers.  Very  satisfied.  Has  shown  to 
over  45  other  companies.  Noticed  big  improvement  in  customer  relations  when 
Philippe  was  hired.  Philippe  went  out  of  his  way  to  help  Compaq  - much 
appreciated  this  help. 


OEM  Customers 

Greg  Holmgren  - 408-922-2797  Frame  Technologies 

Evaluated  many  alternatives  to  Verity.  Said  prior  to  failed  merger  with  Frame 
that  engineers  at  Verity  hated  sales.  May  or  may  not  continue  with  Verity  - not 
entirely  convinced  it  is  the  best  technical  solution  - Claritech  from  Camegie- 
MeUon  is  technically  about  the  same.  Claritech  is  not  as  mature  a company  as 
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Darell  Long  415-390-8200  Global  Village 

Had  really  enjoyed  working  for  Philippe  Courtot  at  cc:Mail.  Philippe  has  a strong 
personality,  lots  of  charisma  and  works  best  with  forceful  people  that  can  measure 
up  to  him.  Has  heard  people  refer  to  Philippe  as  a markedng  gemus.  Feels  that  he 
is  heading  in  the  right  direction  at  Verity. 

Darell  joined  Verity,  primarily  to  work  for  Philippe,  but  only  stayed  there  8 
weeks.  Verity  was  a very  sick  company  when  Philippe  took  over  with  a vicious 
political  environment.  It  could  not  be  fixed  without  some  drama.  Customer 
expectations  had  been  set  too  high  too  early.  Darell  felt  that  it  was  too  hard  to 
succeed  there  personally  when  he  went  in  given  the  political  struggles  and  the 
masses  of  work  that  needed  to  be  done  to  fix  things. 

Other  References 

David  Stamm  - 408-428-2010  President,  Clarify 

Clarify  uses  Fulcrum  not  Verity  yet.  They  may  support  both,  just  as  they  support 
Oracle  and  Sybase  for  help  desk  applications.  He  claims  a typical  revenue 
breakdown  is  $800  to  $1000  per  user  (using  concurrent  floating  license  model)  for 
full  text  search  module  in  a help  desk  system.  Of  this  the  OEM  gets  about  15%, 
leaving  a company  like  Verity  with  $120  to  $150  per  seat.  He  claims  this  is 
sustainable  for  the  next  few  years.  (I  question  whether  software  prices  will  stay 
as  high  - what  if  Verity  can  only  get  $15  - $30  per  seat  which  is  more  likely  over 
the  next  few  years.)  Has  hired  a few  people  from  Verity's  consulting  group  - 
excellent  people  - as  Verity  moves  to  a more  indirect  sales  model.  Decided  to  go 
with  Fulcrum  because  at  the  time  Verity  could  not  deliver  in  time  and  he  needed 
to  meet  a schedule  and  Verity  was  in  disarray.  Now  Verity  seems  to  be 
improving  so  may  reconsider,  however  Fulcrum  will  not  be  displaced-  The  reason 
they  would  suppon  more  ±an  one  full  text  package  is  to  meet  customer 
compatibility  requirements  - for  example  Cisco  already  has  documents  in  Verity 
format  and  if  Clarify  cannot  support  them  they  could  lose  a sale  to  Cisco. 

List  Of  Potential  Competitors 

Personal  Library  Software 

Richard  Black,  ext  241 
rmb2@pls.com 

2400  Research  Blvd.,  Suite  350 
Rockville,  Maryland  20850 
301.990.1155 
301.963.9738  Fax 

This  has  been  licensed  by  Apple  as  AppleSearcth  Also  has  Callable 
Personal  Librarian  C tools  with  a search  engine  - potentiaRy 
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dangerous  competitor  in  the  C development  community  and  OEM 
channels.  Relevance  ranking  using  post-hoolean  and  natural 
language  techniques.  Generally  not  as  accurate  as  TOPIC. 


Fulcrum  Technology,  SearchTools 

275  Shoreline  Drive,  Suite  510 
Redwood  City,  CA  94065 
415.802.7050 

7060  Fax 

John  Carr,  Director  US  West  & Central 
Charles  Neumeier,  Sales  Manager 
7054 

chuckn@fultech.com 
Miles  Kehoe  7052 
SERIOUS  COMPETITOR. 

This  is  a very  well-respected  OEM  vendor,  moving  into  the  end-user 
market.  Strong  in  the  UNIX  server  market. 


OpenText 

Lee  Levin,  VP  Western  Region 
5101  136th  Street  S.W. 

Edmonds,  WA  98026 
206.742.5951 

6077  Fax 

University  of  Waterloo  alumni,  Canada  Started  hy  doing  Oxford 
English  dictionary  on  CD-ROM.  Customers  include  Canadian 
Pharmaceutical  Assn.,  Union  Bonk  of  Switzerland,  Mutual  Life  of 
Canada  NSA,  Grolier  Publishing,  Peugeot,  Caterpillar.  Expected 
1 993  revenues  of  $2M  to  $3M.  Research  software. 

Jouve 

Mark  Biskebom,  President 
800.835.6883 
203.488.6625 
203.481.1133  Fax 
500  East  Main  St.,  Suite  328 
Branford,  CT  06405-2911 
Not  a serious  competitor. 

ConQuest  Software 

Bob  Karninski 
bob_kaininski@cq.  com 
9705  Patuxent  Woods  Drive 
Columbia,  Maryland  21046 
410.290.6290 

6292  Fax 


Trident  Capital  and  INPUT  Confidential  Page  15 


AMH  June  13, 1994 


INPUT 


.vTw  D t/'  A TJ^Zk-^-j  uxh 

.-'j  uau  ' . ■'?-  ?r>r  ir r^'^H  ^ ' ' *"• 

JskX^Vii^.  iS^Ti'k^J'sJlUll 


V.?! 


rn 


^ V-J 


v^r."  43^tiC  .'Siyttv  f ..list  ,»,<? 

AO  i.  .'  )r.  »v b'i  ■* 

(k 

L-  iSwQ  j%‘ ' 'rt>D  r/^ol 

,Vpt2  -fr.  -•  ••'•1  ■&JtT\ 

• - <! 

^ i.:  J\  'fJfi- 

- £'  ■■  S:6'>^"  ’ ••  fi!  / 

*='■  - ■ r:..^  ^ 

. 5#  i-car,i^ ^ •.'  / ■ C-i  *-  • 'c  . .-  rkv,^  ■ -*' 


Lt  str  n^*‘Xi  M.- 

^■ura'  ^I  in;^: 
.abnov: 

. 

wr  i Xv>'" 

d «is5n*t^.  jctT'A  ^•''  w ‘ ■ > iptc.  ■ . 

.:ni' 

- , ■£  - a.  *^T7  V . ...  . . - . r^  . 

■•V'  -• 


iys:,5‘. 


Cu3V--4>'^*5>  •'■  ■ 


.*  • t- 


Mvk^ 


3: 


;S.' 


iffftyt^r  • 'r'iiXk» -. '^?  liZfti.' 

e b*'' 

'^V  ^J^^a^L, 

. M::  /.  c*x  ‘V r>i'  JO  ! - • 

?aiw.i^'.’- 
’ iijrliirf>*  i ;•*>  '. 
n*«p  ; J ^ ^ 

yrr.I  ri0tjC*U  \ata  ^jr'^  ^ 

S'i,  ‘tsuiixQ 

^s©  <x^s:  0 > 

•/  :A  i ^S;d 


I 


"*  %'H-.  in  ■£  '<ioUi_v 


800.787.1715 
Paul  Nelson,  VP  Dev. 

Mr.  Addison,  Pres  ? 

MaryBeth,  Addison's  assistant 

Could  be  serious  competitor.  Emerging  company  working  with 
Motorola. 

WAIS,  Inc. 

John  Duhring 
415.617.0449 
duhring@wais  .com 
Nathaniel  Lee 

415.617.0444 
info@wais.com 
1040  Noel  Drive 
Menlo  Park,  CA  94025 
415.327.WAIS  (9247) 

415.327.6513  Fax 
fr  ontdesk@wais . com 

Will  be  for  Internet  searching  - don't  forget  WAIS  is  free  public 
domain  software  for  searching  Internet  files. 

INQUIRY  - UMass 

Bruce  Croft 

croft%perth@cs . umass . edu 
413.545.0463 
413.545.1249 
Email  Postscript 
Paul  McOwen 
2475 

Northern  Telecom,  Helmsman 

Tom  Van  Atta 
TN 

615.734.4405 

615.734.5189 

InteleQ,  IQ/Textract 

Penny  Fulton 
Bill  Moss,  Principle 

Mathew  McAndrew,  VP  Sales  & Marketing 
768  Walker  Road,  Suite  227 
Great  Falls,  Virgina  22066 

703.757.7592 

703.757.7593  Fax 

Information  Access  Systems,  ITMS 

303.442.6224 
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4530  Fax 
Eariene  Busch 
Russ  Holsdaw 
3085  Bluff  St. 

Boulder,  CO  80301 

Thunders  tone,  Metamorph 

11115  Edgewater  Drive 
Cleveland,  OH  44102 
216.631.8544 
281-0828  Fax 
Bart  Richards 
epi@thunderstone.  com 

Alliance  Technologies,  TextMachine 

Mike  O’ Grey,  Technical  something  manager 
TX 

512.794.9856 
512.794.0199  Fax 
Server  Unix  only 

HNC,  Inc. 

550 1 Oberlin  Drive 
San  Diego,  C A 92121-1718 
619.546.8877 
452.6524  Fax 

Todd  Gutschow,  VP  Application  Development 

i 

Claritech 

Suite  200A,  319  S.  Craig  St. 

Pittsburgh,  PA  15213 
Ehse  Yoder 
eyoder@clarit.com 
412.621.0570 
412.621.0569  Fax 
David  Evans,  CTO 
Eric  Brown,  eeb 

Tech  International 

12701  Fair  Lakes  Circle,  Suite  870 

FairFax.  Virginia  22033 

703.631.6895 

703.631.6734  Fax 

Edward  G.  Newman,  Executive  VP 

Carol  Kovan 

InfoSoft,  Houghton'Mifflin  spin-off 

Intelliscope 


Trident  Capital  and  INPUT  Confidential  Page  17  AMH  June  13,  1994 


INPUT 


fQ;':  '»^J  *i|t 


t''ii 


■j^ 


’ .*{•  .on  ^ i-'j 


/• 


to«TdDb‘S  .<:j:  J'O  •>  ■ 

V-  ‘ i 

r,\ 

^ ' jJ  -.  :.c* 
•s^  id.^-  rr/^trv 

• .0'*I 

■ ^ wl7(2  ..ThTXiO  ' \*^ 

•'v'8ir,u<  £;.o 

j^'i  P^'  iX%<tP 
: ,%■  i'>euj  ' ' 


il  V.K  Jr  ')S  : ..  vc 

K ii:  t.i  .'Jj  •: 


3 


*r* ' ::iV  ^ 13 

^w- 

DvC  ' rtdSi  i^ 

€nr'  ..Sf’r^,T|  Lr  jO 
uiwtr3 


lii 

Iftadti,  . :^i 

i.^sfsp.  y»/;v!iJ  ^ li  * OViLl 
• a'r 

ssiH  s^»  < itS”  i '-0^ 

^ jDfyt  u'jtaES  »0  tn/Ji«^b3 

oftva/l  fc»*;jc.l> 


fii  it<j»  ''Si -^«oj  * fio;*^1aI 

.jqc>o<aUfur 


m$0M 


ti  tu  fUL  HMi^' 


Vt, 


■ ■■*- 


I . 


7.ii 


Kara  Pinkerton 
Senior  Account  Manager 
22525  S.E.  64th  Place 
Suite  210 

Issaquah,  WA  98027 

206.557.3644 

206.557.3645  Fax 
Kirby  Mansfield 


Noblenet,  RPC  tools 

Virginia  Systems,  Midlothian,  VA  804-739-3200  Sonar 
Professional  ($795),  Sonar  Text  Retrieval  ($295)  - retrieves  text 
from  Mac  files. 

Microdynamics  MARS  - for  Apple  Macintosh  imaging  systems, 
now  a subsidiary  of  another  company.  Silver  Spring,  MD  (301- 
589-6300)  ($70K+  per  system),  ^ovides  complete  systems  for 
imaging,  OCR,  scanning  and  retrieval. 

Empirical  Research  Systems  - Tacoma,  WA  - 206-627-8511, 

Fax:  206-627-5934 

MINDS  - H3rpertext  Retrieval  System  - asks  a question  of  the  user 
and  finds  information.  Used  for  tech  support,  legal,  government, 
technical  documentation. 

Skytronics  Software  - Nottingham,  UK  011-44-602-864350,  Fax: 
011-44-602-861717.  Found-It!  Text  Retrieval  -for  word  processing 
files. 

Knowledge  Set  - Mountain  View,  CA  - CD-ROM  publishing,  first 
product  was  in  1985  timeframe  - Grofier  Encyclopaedia,  one  of  the 
first  CD-ROMs  for  PCs.  Acts  as  a service  for  major  publishers  to 
put  information  on  CD-ROMs. 

Academic  software  from  Cornell  University  is  believed  to  be  the 
foundation  for  Individual,  Inc.'s  SMART  software  for  Heads  Up!  and 
First!  daily  electronic  news  services. 

XSoft  - Xerox  Document  Management  Systems  subsidiary.  415- 
424-0111,  fax:  415-813-7181,  5ifo@xsoft.xerox.com.  Visual  Recall 
- based  on  Xerox  PARC  technology. 

There  are  numerous  small  PC -based  desktop  systems  firom  Alki 
(206-286-2600)  ($39.95),  Claris  (408-727-8227)  ($69),  Microlytics 
(716-248-9150)  ($40),  On  Technology  (617-374-1400)  ($129). 

Also  word  processing  vendors  incorporate  full-text  search  into  their 
products  for  word  processing  files. 

NeXTStep  from  NeXT  has  fuU-text  search  built  into  the  operating 
system. 
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Printed  By:  Philippe  Courcot  7/7/94  6:24  Page:  1 

From:  Philippe  Court ot  (7/7/94) 

Tc:  Steve  Hall 

CC : Sue  Earsamian,  Philippe  Courtot,  Stephany  Demy,  Laurent  Le  Foil, 

John  Lehurian,  don  mccauley,  Hugo  Sluimer 

REGARDING  Q1  OUTLOOK  and  Eusiness  Mix  Analysis 

Q1  OUTLOOK  I V? ' 


ks.  of  last  Monday  the  Outlook  for  Q1  is  as  follows  : 

risk  HM  Customs  $ 65k 


i 

NEurepe 

615K 

CEurope 

195K 

f 

SEurope 

mJ 

50K 

Total 

860K 

i 

US  Comm 

830K 

USFeder 

37  ^ 

275K 

V 

ROW 

71^ 

260K 

OEM's 

Total 


45K 


upside  WRQ  25K, CISCO  15K  ( HP  WWW  in  Oulook  at  $100K) 
upside  McDonaldDouglass  $ 100K,AF  Pentagon  $ 35K 


2, 27 OK  versus  a FCST  of  $ 2.4M  and  a PLAN  of  $ 2 . 6M 


Maine 

ConsEU 

ConsUS 

Cons Fed 

Cons ROW 

TrainingUS 

TrainingEU 


935K 

185K 

205K 

75K 

65K 

170K 

50K 


Total 

Gran  Total 


1, 685K 
3, 955K 


versus  a PLAN  and  a FCST  of  $ 1 . 6M 

versus  a PLAN  of  $ 4.2M  and  a FCST  of  $ 4 , OM 


Q2  is  shapping  up  very  v;ell  and  generally  speaking  the  activity  is  picking  up 
in  both  cornmercial  and  federal  US  and  OEM's. 


We  signed  last  week  with  NO  HANDS,  should  sign  this  week  with  SARROS. 


I U ffLJ 

Mix  Anlysis  ( Q1  Outlook  ) 


Revenues  Number 

of  accounts 


Europe 

S60K 

34 

ROW 

260K 

15 

US  Com 

830K 

41 

USFed 

275K 

OEIM ' s 

45K 

- 

Total 

2 , 27CK 

100 

Number 

Number 

Average 

0 f VAR ' 3 

of  A/O 

transactions 

4 

5 

30K 

4 

1 

18K 

10 

8 

25I< 

- 

2 

39K 

1 

- 

5c:< 

19 

16 

25K* 

Without  A/0  which  are  typically  less  than  $ 5K  per  transactions. 
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Th.0  average  transaction  dollar  ainount  will  increase  during  the  course  of  the 
year  and  this  because  ; 

- VAR ' s will  start  generating  business  ( they  are  currentlty  dropping 
the  average  down  because  of  the  low  initial  API  investment  they  make 

.Typically  $ 5K  zo  $ 7k  and  we  are  not  asking  any  royalty  prepayment  ) 

- Many  customers  are  waiting  our  client  ser^/er  product  to  deploy  the 
application  in  a big  v;ay . 

- Some  Motes  agents  transactions  wi^l  be  very  big,  like  ininiinuiri  1,000 
users  at  s ICO  per  seat.  - 


The  FY  PLAN  reflects  tnis  trend  via  increase  in  sales  productivity  . 

3y  FY  year  end  we  should  every  sales  person  generating  at  least  $ 20 OK  in 
revenues  per  Quarter  with  an  average  transaction  in  the  $ 50K  range. 


rY96  should  see  another  significant  productivity  gain  due  to  traxissictions 
getting  bigger,  the  transaction  cycle  getting  shorter  and  of  having  the  chanr 
el 

at  full  production. 

we  saw  at  cc:Mail  the  same  phenomema  whereby  we  started  with  an  average  of 
$ 15K  per  transactions  and  three  years  later  where  at  more  than  $ lOOK  per 
transactions  with  some  beeing  multi  million  dollar  range. 


Please  let  me  know  if  you  need  more  info. 


Thanks  Philippe 


Note  for  the  recipients  who  do  not  know  Steve  . He  is  one  of  the  partners  at 
Trident  Capital  ( a potential  investor  in  the  new  and  last  round  of 
financing  ) doing  the  due  diligence  on  Verity. 
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5 Surbiton  Hill  Road 
Surbiton,  Surrey 
KT6  4TW 

TEL:  081  - 390  - 3330  FAX:  081  - 390  - 3334 


To: 

Philippe  Courtot 

Organisation: 

V’^eri  ty 

Fax  Number: 

Mtn  View 

From: 

Stephen  Cole 

Date: 

Tuesday.  July  5,  1994 

Subject: 

TJK  Outlook 

No  of  Pages: 

3 inc  front  sheet 

a# 


We  are  loolnng  at  .i;61  SK  for  Q| . Risk  is  one  » opperruruty  - HM  Customs  and  Excise.. 
This  may  slip  lo  Sepi.  if  Uic  pio  joci  boaiUs  aic  iiiisscU,  Tlic  lostis  solid.  We  h:ive  other 
opportunities  if  this  slips. 

Q2  is  looking  very  promising.  5722K  with  good  oppoi'Luniiics. 

P^er  Bolton  is  shaping  up  vtiry  well  - he  ha.s  closed  his  first  orders.  1 am  replacing  Marc 
Adams  next  two  weeks  ' Marc  did  not  perform  The  delinquent  accounts  list  is 
unacceptable  and  Enc  and  1 are  working  on  it  - 1 expect  to  see  improvement  by  the  end  of 

The  isouth  Athcans  arc  very  positive,  after  Chris  de  Wet  attended  DASH  - they  have 
substandal  opponunities  - the  results  will  show  in  Q2. 
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SU3ET:: 

I ! 

Please  fii)d  enclosed  my  updated  Outlook  fcpoiT  foi‘  Ql. 

i 

I Lasi  week  was  a good  week  in  France. 


We  had  our  first  order  in  rbe  banking  market  with 
m.staUation  of  TOPIC  on  top  of  LOTUS  Notes  for 


a major  reference.  IKDOSUEZ.  and  also  our  first 
a global  of  22  K$. 


ITiis  ronfarr  was  taken  at  the.  .Sybase  fomm  on  June  the  8th,  we  invited  the  prospect  to  the  seminai- 

T ^ftady  confirmed  a meciinx  widi  Ujem  on  June  27l1i  and  nnully  xoi 

ihcir  ordCT  on  June  the  30th  and  the  occording  paperwork  early  July.  ' 


The  concjn.sion  of  thi.s  deal  : hnw 
confirms  the  interest  within  Notes 


rn  sell  in  one  vj^ir  with  thf-.  Mlp  of  paitncxs.  TTiis  by  the  way  al.so 
major  accounLS  for  additional  rstrievaj  functionalities. 


Good  reception. 
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REST  -OF-WORLD  TERRITORY 
Q1  FY95  REVENUE  SUMMARY 
JULY  4.  1994 


UCENSE 

FORECAST 

PLAN 

OUTLOOK 

AUSTRALIA 

OTHER 

200 

300 

25S 

92 

166 

NEW  MAINTENANCE 

26 

CONSULTING/TRAINING 

64  - 

TOTAL 

348 

Upside 

75k  (BHP  Newcastle,  BHP  Project  M, 
HongKong  Patents,  Unisys) 

*Re-activated  RTA 
*DIO 

21k,  recognizable  maint.,  invoiced 
19k  recognizable  maint.,  received  check 
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......... 

........ 

........ 

....□ 

Confirming  existing  ideas 
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......... 

........ 

........ 

....□ 

Meeting  expectations 

□.... 

......... 

........ 

........ 

....□ 

Other 
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......... 

........ 

........ 

....□ 

6.  Which  topics  in  the  report  were  the  most  useful?  Why? 


7.  In  what  ways  could  the  report  have  been  improved? 


8.  Other  comments  or  suggestions: 


Name 


Title 


Department 


Company 


Address 


Country 


Telephone 


Date  completed 


Thank  you  for  your  time  and  cooperation. 
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Report  Quality  Evaluation 


To  our  sponsors: 

To  ensure  that  the  highest  standards  of  report  quality  are  maintained,  INPUT  would  appreciate  your  assessment  of  this 
report.  Please  take  a moment  to  provide  your  evaluation  of  the  usefulness  and  quality  of  this  study.  When  complete, 
simfoly  fax  to  INPUT  at  (650)  961-3966 

J Thank  You. 


1. 

2. 


Report  title  Evaluation  of  Baan  Services  Providers  in  North  America 


Please  indicate  your  reason  for  reading  this  report: 

□ Required  reading  □ New  product  development 

□ Area  of  high  interest  □ Business/market  planning 

□ Area  of  general  interest  □ Product  planning 


□ Future  purchase  decision 

□ Systems  planning 

□ Other 


Please  indicate  extent  to  which  report  has  been  used  and  overall  usefulness; 


Extent 

Read  Skimmed 

Executive  Overview □ □ 

Complete  report □ □ 

Part  of  report  ( %) □ □ 


Usefulness  (1=Low,  5=High) 

1 2 3 4 5 

□ □ □ □ □ 

□ □ □ □ □ 

□ □ □ □ □ 


How  useful  were; 

Data  presented □ □ □ □ 

Analyses □ □ □ □ 

Recommendations □ □ □ □ 

How  useful  was  the  report  in  these  areas: 

Alerting  you  to  new  opportunities  or  approaches □ □ □ □ 


□ 

□ 
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Covering  new  areas  not  covered  elsewhere □. 

Confirming  existing  ideas □. 

Meeting  expectations □. 

Other □. 

Which  topics  in  the  report  were  the  most  useful?  Why? 


□ 

□ 

□ 
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□ . 
□ . 
□ . 
□ . 


.□ 

.□ 

.□ 

.□ 


,□ 

□ 

,□ 

,□ 

□ 


In  what  ways  could  the  report  have  been  improved? 


8.  Other  comments  or  suggestions; 


Name 


Title 


Department 


Company 


Address 


Country 


Telephone 


Date  completed 


Thank  you  for  your  time  and  cooperation. 
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Report  Quality  Evaluation 


To  our  sponsors: 

To  ensure  that  the  highest  standards  of  report  quality  are  maintained,  INPUT  would  appreciate  your  assessment  of  this 
report.  Please  take  a moment  to  provide  your  evaluation  of  the  usefulness  and  quality  of  this  study.  When  complete, 
simply  fax  to  INPUT  at  (650)  961-3966 

Thank  You. 

1.  Report  title  Evaluation  of  Baan  Services  Providers  in  North  America 

2.  Please  indicate  your  reason  for  reading  this  report: 

□ Required  reading  □ New  product  development  □ Future  purchase  decision 

□ Area  of  high  interest  □ Business/market  planning  □ Systems  planning 

□ Area  of  general  interest  □ Product  planning  □ Other 

3.  Please  indicate  extent  to  which  report  has  been  used  and  overall  usefulness; 

Extent  Usefulness  (1=Low,  5=High) 


Read 

Skimmed 

1 

2 

3 

4 

5 

Executive  Overview 

□ 

□ 

□.... 

......... 

........ 

....□ 

Complete  report 

□ 

□ 

□.... 

......... 

........ 

....□ 

Part  of  report  ( %) 

□ 

□ 

□.... 

......... 

........ 

....□ 

How  useful  were: 

Data  presented 

□.... 

......... 

........ 

....□ 

Analyses 

□.... 

......... 

........ 

....□ 

Recommendations 

□.... 

......... 

........ 

....□ 

How  useful  was  the  report  in  these  areas: 

Alerting  you  to  new  opportunities  or  approaches 

□.... 

......... 

........ 

....□ 

Covering  new  areas  not  covered  elsewhere 

□.... 

......... 

Confirming  existing  ideas 

□.... 

......... 

........ 

....□ 

Meeting  expectations 

□.... 

......... 

........ 

....□ 

Other 

□.... 

......... 

........ 

....□ 

6.  Which  topics  in  the  report  were  the  most  useful?  Why? 


7.  In  what  ways  could  the  report  have  been  improved? 


8.  Other  comments  or  suggestions: 


Name 


Title 


Department 


Company 


Address 


Country 


Telephone 


Date  completed 


Thank  you  for  your  time  and  cooperation. 
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Report  Quality  Evaluation 


^ To  our  sponsors: 

i To  ensure  that  the  highest  standards  of  report  quality  are  maintained,  INPUT  would  appreciate  your  assessment  of  this 
' report.  Please  take  a moment  to  provkte-your  evaluation  of  the  usefulness  and  quality  of  this  study.  When  complete, 
j simply  fax  to  INPUT  at  (650)  961-3966  t 

J Thank  You. 

I 1.  Report  title  Evaluation  of  Baan  Services  Providers  in  North  America 

2.  Please  indicate  your  reason  for  reading  this  report: 

□ Required  reading  □ New  product  development  □ Future  purchase  decision 

□ Area  of  high  interest  □ Business/market  planning  □ Systems  planning 

□ Area  of  general  interest  □ Product  planning  □ Other 

3.  Please  indicate  extent  to  which  report  has  been  used  and  overall  usefulness: 

Extent  Usefulness  (1=Low,  5=High) 


Read  Skimmed 
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2 

3 

4 

5 

Executive  Overview □ □ 

□.... 

......... 

....□ 

Complete  report □ □ 

□.... 

......... 

........ 

....□ 

Part  of  report  ( %) □ □ 

□.... 

......... 

........ 

....□ 

How  useful  were: 

Data  presented 

□.... 

......... 

........ 

....□ 

Analyses 

□.... 

......... 

........ 

....□ 

Recommendations 

□.... 

......... 

......... 

....□ 

How  useful  was  the  report  in  these  areas: 

Alerting  you  to  new  opportunities  or  approaches 

□.... 

......... 

....□ 

Covering  new  areas  not  covered  elsewhere 

□.... 

......... 

....□ 

Confirming  existing  ideas 

□.... 

......... 

........ 

....□ 

Meeting  expectations 

□.... 

......... 

........ 

....□ 

Other  
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......... 

........ 

......... 

6.  Which  topics  in  the  report  were  the  most  useful?  Why? 


7.  In  what  ways  could  the  report  have  been  improved? 


8.  Other  comments  or  suggestions: 


Name 


Title 


Department 


Address 


Telephone 


Company 


Date  completed 

Thank  you  for  your  time  and  cooperation. 
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